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History of CLU 

The development of CLU began in January 1974. By the summer of 1975. the first version of 
the language had been completed. Over the next two years, the entire language design was 
reviewed and two implementations were produced. Based on this review, and on the experience 
gained in using CLU, a second version of the language was designed in the fall of 1977. and a new 
implementation is now complete. A preliminary version of this manual appeared in July 1978. 
Since that time, an additional statement for exception handling, an own variable mechanism, and 
three new basic type generators have been added to the language, and a number of minor changes 
have been made to the I/O facilities. 

Guide to the Manual 

This document serves both as an introduction to CLU and as a language reference manual. 
Sections 1 through 4 present an overview of the language. These sections highlight the essential 
features of CLU. and discuss how CLU differs from other, more conventional, languages. Sections 
5 through 13 form the reference manual proper. These sections describe each aspect of CLU in 
detail, and discuss the proper use of various features. Appendices I through HI provide concise 
summaries of CLU's syntax, data types, and I/O facilities. Appendix IV contains example 

programs. 

Those readers wanting an introduction to CLU should read Sections 1 through 13 in order. 
concentrating on Sections 1 through 4, 8, 9. and 13. (A brief introduction may be found in 
[Liskov77].) Appendix IV should also be of interest. After becoming familiar with CLU. specific 
questions can be answered by consulting Sections 5 through 13 and Appendices I through III. 

We would greatly appreciate receiving comments on both the language and this manual. 
Comments should be sent to Barbara Liskov. Laboratory for Computer Science, Massachusetts 
Institute of Technology, 545 Technology Square, Cambridge. MA 02139. 

[Liskov77] Liskov. B.. Snyder. A.. Atkinson. R.. and Schaffert. C. Abstraction Mechanisms in 
CLU. Comm. ACM 20, 8 (Aug 1977). 564-576. 

Keywords: programming languages, data abstractions, strong type checking, modularity, exception 
handling, iteration abstractions, CLU 
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1. Modules 

A CLU program consists of a group of modules. Three kinds of modules are provided, one 
for each kind of abstraction that we have found to be useful in program construction. Procedures 
support procedural abstraction, iterators support control abstraction, and clusters support data 
abstraction. 

1.1 Procedures 

A procedure performs an action on zero or more argument objects, and terminates returning 
zero or more result objects. All communication between a procedure and its invoker generally takes 
place through these arguments and results; a procedure has no global variables unless it is defined 
in a cluster that has own variables. A procedure may retain objects from one invocation to the 
next through the use of local own variables. 

A procedure may terminate in one of a number of conditions. One of these is the normal 
condition; the others are exceptional conditions. Differing numbers and types of results may be 
returned in different conditions. All information about the names of conditions and the number 
and types of arguments and results is described in the procedure heading. For example, 

square_root - proc <x: real) returns (real) signals (no_real_result) 
is the heading of a square_root procedure, which takes a single real argument. Square_root 
terminates either in the normal condition (returning the square root of x) or in the nojeaLresult 
condition (returning no results). 

1.2 Iterators 

An iterator computes a sequence of items based on its input arguments. These items are 
provided to its invoker one at a time. Each item consists of zero or more objects. 

An iterator is invoked by a for statement. The iterator provides each item by yielding it. The 
objects in the item are assigned to the loop variables of the for statement, and the body of the for 
statement is executed. Then control is returned to the iterator so it can yield the next item in the 
sequence. The for loop is terminated when the iterator terminates, or the for loop body may 
explicitly terminate itself and the iterator. 
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Iterators 



Just like a procedure, an iterator has no global variables unless it is defined in a cluster that 
has own variables. An iterator may retain objects from one invocation to the next through the use 
of local own variables. An iterator may also terminate in one of a number of conditions. In the 
normal condition, no results can be returned, but different numbers and types of results can be 
returned in the exceptional conditions. All information about the names of conditions, and the 
number and types of arguments and results is described in the iterator heading. For example, 

leaves = Iter (t: tree) yields (node) 
is the heading for an iterator that produces all leaf nodes of a tree object. This iterator might be 
used in a for statement as follows: 

for leaf: node In leaves(x) do 
... examinedeaf) ... 
end 

1.3 Clusters 

A cluster implements a data abstraction, which is a set of objects and a set of primitive 
operations to create and manipulate those objects. The operations can be either procedural or 
control abstractions. The cluster heading states what operations are available, e.g., 

int_set = cluster is create, insert, elements 
states that the operations of int_set are create, insert, and elements. 

A cluster is used to implement a distinct data type, different from all others. Users of this type 
are constrained to treat objects of the type abstractly. That is, the objects may be manipulated only 
via the primitive operations. This means that information about how the objects are actually 
represented in storage may not be used. 

Inside the cluster, a concrete representation (in terms of some other type) is chosen for the 
objects, and the operations are implemented in terms of this representation. Each operation is 
implemented by a routine (a procedure or iterator); these routines are exactly like those not 
contained in clusters, except that they can treat the objects being defined by the cluster both 
abstractly and in terms of the concrete representation. (The ability to treat objects abstractly is 
useful when defining recursive data structures, where the concrete representation makes use of the 
new type.) A cluster may contain additional procedures and iterators, which are purely for local 
use; these routines do not define operations of the type. The routines in a cluster are not 
considered to be separate modules; they are simply part of the cluster module. 
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A cluster may also contain own variables, whose lifetimes are independent of routine 
activations. These variables are globally available to all routines in the cluster, but are not 
available from outside the cluster. 

1.4 Parameterized Modules 

Procedures, iterators, and clusters can all be parameterized. Parameterization provides the 
ability to define a class of related abstractions by means of a single module. Parameters are limited 
to the following types, int, real, bool, char, string, null, and type. The most interesting and 
useful of these are the type parameters. 

When a module is parameterized by a type parameter, this implies that the module was written 

without knowledge of what the actual parameter type would be. Nevertheless, if the module is to 

do anything with objects of the parameter type, certain operations must be provided by any actual 

type. Information about required operations is described in a where clause, which is part of the 

heading of a parameterized module. For example, 

set = cluster [t: type] is create, insert, elements 

where t has equal: proctype (t, t) returns (bool) 

is the heading of a parameterized cluster defining a generalized set abstraction. Sets of many 

different element types can be obtained from this cluster, but the where clause states that the 

element type is constrained to provide an equal operation. 

To use a parameterized module, actual values for the parameters must be provided, using the 

general form 

module_name [ parameter _values ] 

Parameter values must be computable at the time they are compiled. Providing actual parameters 

selects one abstraction out of the class of related abstractions defined by the parameterized module; 

since the values are known at compile-time, the compiler can do the selection and can check that 

the where clause restrictions are satisfied. The result of the selection, in the case of a 

parameterized cluster, is a type, which can then be used in declarations; in the case of 

parameterized procedures or iterators, a procedure or iterator is obtained, which is then available 

for invocation. For example, sett int] is a use of the set abstraction shown above, and is legal 

because int does have an equal operation. 
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A parameterized cluster, procedure, or iterator is said to implement a type generator, procedure 
generator, or iterator generator, respectively. 

1.5 Program Structure 

As was mentioned before, a program consists of a group of modules. Each module defines 
either a single abstraction or, if parameterized, a class of related abstractions. Modules are never 
embedded in other modules. Rather, the program is a single level structure, with all modules 
potentially usable by all other modules in the program. Type-checking of inter-module references 
is carried out using information in the module headings, augmented, in the case of clusters, by the 
headings of the procedures and iterators that implement the operations. 

Each module is a separate textual unit, and is compiled independently of other modules. 
Compilation and program construction are discussed in Section 4. 



2. Data Types 

One of the primary goals of CLU was to provide, through clusters, a type extension 
mechanism that permits user-defined types to be treated as similarly as possible to built-in types. 
This goal has been achieved to a large extent. Both built-in and user-defined types are viewed as 
providing sets of primitive operations, with access to the real representation information limited to 
just these operations. The ways in which built-in types differ from user-defined types will be 
discussed in Section 2.3 below. 

2.1 Built-in Types 

CLU provides a rich set of built-in types and type generators. The built-in types are Int real, 
bool, char, string, null, and any. Int and real provide the usual arithmetic and relational 
operations on integers and real numbers, and bool provides the standard boolean operations. 
Char is the full ASCII character set; the usual relational operators are provided, along with 
conversion to and from integers. Strings are (possibly empty) sequences of characters; usual string 
operations like selecting the ith character, and concatenation are provided. However, strings are 
somewhat unusual in that string objects cannot be modified. For example, it is not possible to 
change a character in a string; instead, a new string, differing from the original in that position. 
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may be created. 

Null is a type containing one object, nil. Null is used primarily in conjunction with the tagged 
union type discussed below. 

Any is provided to permit an escape from compik-time type-checking. The type any 
introduces no new objects, but instead may be used as the type of a variable when the programmer 
wishes to assign objects of different types to that variable, or does not know what kind of object 
will be assigned to the variable. CLU provides a built-in procedure generator, force, which 
permits a run-time examination of the type of object named by a variable of type any. 

The built-in type generators are: array, sequence, record, struct, oneof, variant 
proctype. and itertype. Arrays are one-dimensional. The type of element contained in the array 
is specified by a type parameter, e.g., arrayUnti and array! arrayUnt]]. (The latter example 
shows how a two-dimensional array might be handled.) CLU arrays are unusual in that they can 
grow dynamically. An array is often empty when first created, but there is also a special array 
constructor for specifying initial elements. Array operations can grow and shrink the array at 
either end, query the current size and low and high bounds of the array, and access and replace 
elements within the current bounds. 

Sequences are immutable arrays, in that the size of a sequence can not be changed dynamically, 
and new elements cannot be stored into a sequence. New sequences can be constructed from 
existing sequences in much the same way as new strings are created. Sequence operations are culled 
from both string and array operations, and there is a special sequence constructor, which is 
syntactically similar to the array constructor form. 

CLU records are heterogeneous collections of component objects; each component is accessed by 
a selector name. Records must be explicitly constructed by means of a special record constructor. 
The constructor requires that an object be provided for each component of the record; this 
requirement ensures that no component of the record is undefined in the sense of naming no 
object. Record operations permit selection of component objects and replacement of components 
with new objects. 

Structures are immutable records, in that the components of a structure cannot be replaced with 
new objects. Structures are constructed by means of a structure constructor, which is syntactically 
identical to the record constructor form. 
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A oneof type is a tagged, discriminated union. The objects of a oneof type each consist of a 
tag (an identifier) and a component object; oneof objects with different tags may have component 
objects of different types. A oneof object, once created, cannot be changed. Thus, oneof types 
provide a capability similar to that provided by variant records in Pascal. Operations are 
provided for creating oneof objects. Oneof objects are usually decomposed through the tagcase 
statement. 

Variants are mutable oneofs. The tag and component object of a variant can be replaced 
simultaneously with new values. Like oneofs, variants are usually decomposed through the 
tagcase statement. 

Procedure and iterator types provide procedures and iterators as first-class objects; i.e., routines 
(including those in clusters) can be assigned to variables and can occur as components of other 
objects. These types are parameterized by all the information appearing in a procedure or iterator 
heading, with the exception of the formal argument names. 

In addition to all the built-in types and type generators mentioned above, CLU programs may 
also make use of the type type. The use of type values is limited to parameters of parameterized 
modules; there are no arguments or variables of type type. 

Finally, CLU provides a number of types and procedures to support I/O. These types are not 
considered to be built-in types of CLU, but they must be available in the library. These types are 
described in Appendix III. 

2.2 User-Defined Types 

Users may define new types by providing clusters that implement them. The cluster may 
implement a single type, or, in the case of a parameterized cluster, a group of related types. The 
type or types defined by a cluster are distinct from all built-in types and from all types defined by 
other clusters. 

2.3 Comparison of User-Defined and Built-in Types 

Little distinction is made between user-defined types and built-in types. Either can be used 
freely to declare the arguments, variables, and results of routines. In addition, in either case there 
is a set of primitive operations associated with the type, and the same syntax is used to invoke these 
operations. The ordinary syntax to name an operation is 
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type $ op_name 
Since different types will often have operations of the same name (e.g., create), this compound form 
is used to avoid ambiguity. 

For many operations there is also a customary abbreviated form of invocation, which can be 
used for user-defined types as well as for built-in types. There is a standard translation from each 
abbreviated form to the ordinary form of invocation. For example, an addition operation is 
usually invoked using the infix notation "x + y"; this is translated into "T$add(x. y)", where T is 
the type of x. Extending notation to user-defined types in this way is sometimes called operator 
overloading. We permit almost all special syntax to be overloaded; there are always constraints on 
the overloading definition (e.g., add must have two input arguments and one result), but they are 

quite minimal. 

Nevertheless, there are three main distinctions between built-in types and user-defined types: 

1. Built-in type and type generator names cannot be redefined. (This is 
why we always show them in boldface in this document.) 

2. Some built-in types, e.g., int, real, etc., have literals. There is no 
mechanism for defining literals for user-defined types. 

3. Some built-in types are related to certain other constructs of CLU. For 
example, the tagcase statement is a control construct especially 
provided to permit discrimination on oneof and variant objects. In 
addition, in places where compile-time constants are required, e.g., as 
actual parameters to parameterized modules, the expressions that may 
appear are limited to a subset of the built-in types and their operations. 
One reason for this limitation is that the permitted types are known to 
contain only immutable objects (see Section 3.1). 



3. Semantics 

All languages present their users with some model of computation. This section describes those 
aspects of CLU semantics that differ from the common ALGOL-like model. In particular, we 
discuss the notions of objects and variables, and the definitions of assignment and argument 
passing that follow from these notions. We also discuss type-correctness. 
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3.1 Objects and Variables 

The basic elements of CLU semantics are objects and variables. Objects are the data entities 
that are created and manipulated by programs. Variables are just the names used in a program to 
refer to objects. 

Each object has a type, which characterizes its behavior. A type defines a set of primitive 
operations to create and manipulate objects of that type. An object may be created and 
manipulated only via the operations of its type. 

An object may refer to objects. For example, a record object refers to the objects that are the 
components of the record. This notion is one of logical, not physical, containment. In particular, it 
is possible for two distinct record objects to refer to (or share) the same component object. In the 
case of a cyclic data structure, it is even possible for an object to "contain" itself. Thus, it is 
possible to have recursive data structure definitions and shared data objects without explicit 
reference types. 

Objects exist independently of procedure and iterator activations. Space for objects is 
allocated from a dynamic storage area as the result of invoking constructor operations of certain 
primitive CLU types, such as records and arrays. In theory, all objects continue to exist forever. 
In practice, the space used by an object may be reclaimed (via garbage collection) when that object 
is no longer accessible. (An object is accessible if it is denoted by a variable of an active routine or 
an own variable of any cluster or routine, or is a component of an accessible object.) 

Objects may be divided into two categories. Some objects exhibit time-varying behavior. 
Such an object, called a mutable object, has a state that may be modified by certain operations 
without changing the identity of the object. Records and arrays are examples of mutable objects. 
For example, replacing the ith element of any array a causes the state of a to change (to contain a 
different object as the ith element). 

If a mutable object m is shared by two other objects x and y, then a modification to m made 
via x will be visible when m is examined via y. Communication through shared mutable objects is 
most beneficial in the context of procedure invocation, described below. 

Objects that do not exhibit time-varying behavior are called immutable objects. Examples of 
immutable objects are integers, booleans, characters, and strings. The properties of an immutable 
object do not change with time. These properties generally do not include the properties of any 
component objects. For example, a sequence is immutable even though its elements may be 
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mutable. 

Variables are names used in programs to denote particular objects at execution time. Unlike 
variables in many common programming languages, which are containers for values, CLU 
variables are simply names that the programmer uses to refer to objects. As such, it is possible for 
two variables to denote (or share) the same object. CLU variables are much like those in LISP, 
and are similar to pointer variables in other languages. However, CLU variables are not objects; 
they cannot be denoted by other variables or referred to by objects. Thus, variables declared 
within one routine cannot be accessed or modified by any other routine. 

3.2 Assignment and Invocation 

The basic actions in CLU are assignment and invocation. The assignment primitive x :■ £, 
where x is a variable and E is an expression, causes x to denote the object resulting from the 
evaluation of E. For example, if £ is a simple variable y, then the assignment x :- y causes x to 
denote the object denoted by y. The object is nor copied; after the assignment is performed, the 
object will be shared by x and y. Assignment does not affect the state of any object. 

Figure 1 illustrates these notions of object, variable, and assignment. Here we show variables 
in a stack, and objects in a heap (free storage area), an obvious way to implement CLU. Figure la 
contains three objects: o, 0, and y. a is an integer (in fact, 3) and is denoted by variable x, while 
and y are of type setUnti and are denoted by variables y and z, respectively. Figure lb shows the 
result of executing 

y:-z 
Now y and z both refer to, or share, the same object, y, is no longer accessible, and so can be 
garbage collected. 

Invocation involves passing argument objects from the caller to the called routine and 
returning result objects from the routine to the caller. The objects returned by the procedure, or 
yielded by an iterator, may be assigned to variables in the caller. Argument passing is defined in 
terms of assignment; the formal arguments of a routine are considered to be local variables of the 
routine and are initialized, by assignment, to the objects resulting from the evaluation of the 
argument expressions. We call the argument passing technique call by sharing, because the 
argument objects are shared between the caller and the called routine. The technique does not 
correspond to most traditional argument passing techniques (it is similar to argument passing in 
LISP). In particular it is not call by value because mutations of arguments performed by the called 
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Fig. 1. Assignment 
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routine will be visible to the caller. And it is not call by reference because access is not given to the 
variables of the caller, but merely to certain objects. 

Figure 2 illustrates invocation and object mutation. Figure 2a continues from the situation 
shown in Figure lb, and illustrates the situation immediately after invocation of 

sertint]$insert(y, x) 
(but before executing the body of insert). Insert has two formal arguments; the first, s, denotes the 
set. and the second, v, denotes the integer to be inserted into j. Note that the variables of the caller 
(x, y and. z) are not accessible to insert. Figure 2b illustrates the situation after insert returns. Note 
that object y has been modified and now refers to a (the set y now contains 3). and since y is 
shared by both y and z. the modification of y is visible through both these variables. 

Procedure invocations may be used directly as statements; those that return exactly one object 
may also be used as expressions. Iterators may be invoked only through the for statement. 
Arbitrary recursion among procedures and iterators is permitted. 
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Fig. 2. Invocation and object mutation 
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3.3 Type-Correctness 

The declaration of a variable specifies the type of the objects which the variable may denote. 
In an assignment, the object denoted by the right-hand side must have the same type as the 
variable on the left-hand side: there are no implicit type conversions. (The type of object denoted 
by an expression is the return type of the outermost procedure invoked in that expression, or, if the 
expression is a variable or literal, the type of that variable or literal.) There is one special case; a 
variable declared to be of type any may be assigned the value of any expression. 

Argument passing is defined in terms of assignment; for an invocation to be legal, it must be 
possible to assign the actual arguments (the objects) to the formal arguments (the variables) listed 
in the heading of the routine to be invoked. Furthermore, a return (or yield) statement is legal 
only if the result objects could be legally assigned to variables having the types stated in the 
routine heading. 
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CLU is a type-safe language, in that it is not possible to treat an object of type T as if it were 
an object of some other type S; in particular, one cannot assign an object of type T to a variable of 
type S (unless S is any). The type any provides an escape from compile-time type determination, 
and a built-in procedure generator force can be used query the type of an object at run-time. 
However, any and force are defined in such a way that the type-safety of the language is not 
undermined. The type-safety of CLU, plus the restriction that only the code in a cluster may 
convert between the abstract type and the concrete representation, insure that the behavior of an 
object is indeed characterized completely by the operations of its type. 



4. The Library 

As was mentioned earlier, it is intended that the modules making up a program all be separate 
compilation units. A fundamental requirement of any CLU implementation is that it support 
separate compilation, with type-checking of inter-module references. This checking can be done 
either at compile-time or at load-time (when a group of separately compiled modules are combined 
together to form a program). A second fundamental requirement is that the implementation 
support top-down programming. The definition of CLU does not specify how an implementation 
should meet these requirements. However, in this section we describe the current CLU 
implementation, which may serve as a model for others. 

Our implementation makes use of the CLU library, which plays a central role in supporting 
inter-module references. The library contains information about all abstractions. It supports 
incremental program development, one abstraction at a time, and. in addition, makes abstractions 
that are defined during the construction of one program available as a basis for subsequent 
program development. The information in the library permits the separate compilation of single 
modules, with complete type-checking at compile-time of all external references (such as procedure 

names). 

The library provides a hierarchical name space for retrieving information about abstractions. 
The leaf nodes of the library are description units (DUs), one for each abstraction. Figure S 
illustrates the structure of the library. 
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Fig. 3. A sketch of the library structure showing a DU with pathname B.Y 




A DU contains all system-maintained information about its abstraction. A sketch of the 
structure of a DU is shown in Figure 4. For purposes of program development and module 
compilation, two pieces of information must be included in the DU: implementation information, 
describing zero or more modules that implement the abstraction, and the interface specification. 
The interface specification is that information needed to type-check uses of the abstraction. For 
procedural and control abstractions, this information consists of the number and types of 
parameters, arguments, and results, the names of exceptional conditions and the number and types 
of results returned in each case, plus any constraints on type parameters (i.e., the where clause, as 
described in Section 1.4). For data abstractions, it includes the number and types of parameters, 
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constraints on type parameters, and the name and interface specification of each operation. 

An abstraction is entered in the library by submitting the interface specification; no 
implementations are required. In fact, a module can be compiled before any implementations have 
been provided for the abstractions that it uses; it is necessary only that interface specifications have 
been given for those abstractions. Ultimately, there can be many implementations of an 
abstraction; each implementation is required to satisfy the interface specification of the abstraction. 
Because all uses and implementations of an abstraction are checked against the interface 
specification, the actual selection of an implementation can be delayed until just before (or perhaps 
during) execution. We imagine a process of binding together modules into programs, prior to 
execution, at which time this selection would be made. 

An important detail is the method by which modules refer to abstractions. To avoid the 
problems of name conflicts that can arise in large systems, the names used by a module to refer to 
abstractions can be chosen to suit the programmer's convenience. When a module is submitted for 
compilation, its external references must be bound to DUs so that type-checking can be performed. 
The binding is accomplished by constructing a compilation environment (CE), mapping names to 
DUs and constants, which is passed to the compiler along with the source code when compiling the 
module. A copy of the CE is stored by the compiler in the library as part of the module. A similar 
process is involved in entering interface specifications of abstractions, since these interfaces can 
include references to other (data) abstractions. 

When the compiler type-checks a module, it uses the compilation environment to map the 
external names in the module to constants and DUs, and then uses the interface specifications in 
the referenced DUs to check that the abstractions are used correctly. The type-correctness of the 
module thus depends upon the binding of external references and the interface specifications of all 
referenced DUs, and could be invalidated if changes to the binding or the interface specifications 
were subsequently made. For this reason, the process of compilation permanently binds a module to 
the abstractions it uses, and the interface specification of an abstraction, once defined, is not 
allowed to change. Of course, a new DU can be created to describe a modified abstraction. 
Furthermore, during design (before any implementing modules have been entered into the system) 
it is reasonable to permit abstraction interfaces to change. 

Typically a small to medium sired project will use only one CE, thereby establishing a 
consistent vocabulary for use by all programmers. Larger projects might have a number of 
(possibly "overlapping") CEs. each specialized for some subproject. 
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The library and DU structure described above can be used for purposes other than compiling 
and loading programs. In each case, additional information can be stored in the DU; the "other- 
fields shown in Figure 4 are intended to illustrate such additional Information. For example, the 
library provides a good basis for program verification. Here the "other" information hi the DU 
would contain a formal specification of the abstraction, and possibly some theorems that had been 
proved about the abstraction, while for each implementation that had been verified, an outline of 
the correctness proof might be retained. Additional uses of the Horary include retention of 
debugging and optimization information. 



5. Notation 

We use an extended BNF grammar to define the syntax. The general form of a production is: 

nonterminal ::» alternative 
| alternative 

I - 

| alternative 

The following extensions are used: 

a , ... a list of one or more a's separated by comma*: "a" or "a, a" or 

"a, a, a" etc. 
{a} a sequence of zero or more a's: " " or V or "a a" etc 

[a] an optional o: " " or "a". 

Nonterminal symbols appear in normal face. Reserved words appear in bold face. AH other 
terminal symbols are non-alphabetic, and appear in normal face. 

Full productions are not always shown in the body of thi* manual; often alternatives are 
presented and explained individually. Appendix I contains the complete syntax. 
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6. Lexical Considerations 

A module is written as a sequence of tokens and separators. A token is a sequence of "printing" 
ASCII characters (octal value 40 thru 176) representing a reserved word, an identifier, a literal, an 
operator, or a punctuation symbol. A separator is a "blank" character (space, vertical tab. horizontal 
tab. carriage return, newline. form feed) or a comment. In general, any number of separators may 
appear between tokens. Tokens and separators are described in more detail in the sections below. 

6.1 Reserved Words 



The following character sequences are reserved words: 



any 

array 

begin 

boot 

break 

cand 

char 

cluster 

continue 

cor 



cvt 

do 

down 

else 

elseif 

end 

except 

exit 

false 

for 



force 

has 

if 

in 

int 

is 

iter 

itertype 

nil 

null 



oneof 

others 

own 

proc 

proctype 

real 

record 

rep 

resignal 

return 



returns 

sequence 

signal 

signals 

string 

struct 

»g 
tagcase 

then 



true 

type 

up 

variant 

when 

where 

while 

yield 

yields 



Upper and lower case letters are not distinguished in reserved words. For example, 'end'. 'END', 
and 'eNd' are all the same reserved word. Reserved words appear in bold face in this document. 



6.2 Identifiers 

An identifier is a sequence of letters, digits, and underscores that begins with a letter or 
underscore, and that is not a reserved word. As in reserved words, upper and lower case letters are 
not distinguished in identifiers. 

In the syntax there are two different nonterminals for identifiers. The nonterminal idn is used 
when the identifier has scope (see Section 8.1); idns are used for variables, parameters, module 
names, and as abbreviations for constants. The nonterminal name is used when the identifier is 
not subject to scope rules; names are used for record and structure selectors, oneof and variant tags, 
operation names, and exceptional condition names. 
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6.3 Literals 

There are literals for naming objects of the built-in types null, boot, hit, real, char, and 
string. Their forms are discussed in Section 7. 

6.4 Operators and Punctuation Symbols 

The following character sequences are used as operators and punctuation symbols: 



( 


r 


<» 


M<B 


- 


• * 


) 


• 


m 


**m 


t 


// 




. 


>- 


<v>- 


/ 


k 


[ 


$ 


> 


**> 




1 


] 


:- 








(V 



~< 



6.5 Comments and Other Separators 

A comment is a sequence of characters that begins with a percent sign <X>, ends with a newline 

character, and contains only printing ASCII characters and horizontal tabs in between. For 

example: 

i :- a[i] + X a comment in an expression 
Mil 

A separator is a blank character (space, vertical tab, horizontal tab, carriage return, newline, 

form feed) or a comment. Zero or more separators may appear between any two tokens, except that 

at least one separator is required between any two adjacent non-self-terminating tokens: reserved 

words, identifiers, integer literals, and real literals. This rule is necessary to avoid lexical 

ambiguities. 

6.6 Semicolons 

The use of semicolons (;) to terminate statements and various phrases is permitted in CLU, but 
semicolons are completely optional and their use is discouraged. Placement of semicolons is not 
shown in the body of this manual; refer to the complete syntax in Appendix 1. 
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7. Types, Type Generators, and Type Specifications 

A type consists of a set of objects together with a set of operations to manipulate the objects. 
As discussed in Section 3.1, types can be classified according to whether their objects are mutable or 
immutable. An immutable object (e.g. an integer) has a value that never varies, while the value 
< state) of a mutable object can vary over time. 

A type generator is a parameterized type definition, representing a (usually infinite) set of 
related types. A particular type is obtained from a type generator by writing the generator name 
along with specific values for the parameters; for every distinct set of legal values, a distinct type is 
obtained. For example, the array type generator has a single parameter that determines the 
element type; arrayiint], arra/real], and arraytarrayUnt)] are three distinct types defined by 
the array type generator. Types obtained from type generators are called parameterized types; 
others are called simple types. 

Within a program, a type is specified by a syntactic construct called a type^spec. The type 
specification for a simple type is just the identifier (or reserved word) naming the type. For 
parameterized types, the type specification consists of the identifier (or reserved word) naming the 
type generator, together with the parameter values. 

This section gives an informal introduction to the built-in types and type generators provided 
by CLU; many details (such as error conditions) are not discussed. Complete and precise 
definitions are given in Appendix II. Sections 7.1 to 7.7 describe the objects, literals, and some of 
the operations for each of the built-in types, while Sections 7.8 to 7.14 describe the objects, type 
specifications, and interesting operations of types obtained from the built-in type generators. A 
number of operations can be invoked using infix and prefix operators; as the various operation 
names are introduced, the corresponding operator, if any, will follow in parentheses. 

In addition, we describe type specifications for user-defined types, and other special type 
specifications in Section 7.15. The mechanism by which new types and type generators are 
implemented is presented in Section 13. 

7.1 Null 

The type null has exactly one immutable object, represented by the literal nil. The type null Is 
generally used as a kind of "place filler" in a oneof or variant type (see Sections 7.12 and 7.13). 
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The two immutable objects of type bool, with literals true and false, represent logical truth 
values. The binary operations equal («), and <&), and or (I), are provided, as well as unary not <~>. 

7.3 Int 

The type Int models (a range of) the mathematical integers. The exact range is not part of the 
language definition, and can vary somewhat from implementation to implementation <see 
Appendix II, Section 3). Integers are immutable, and are written as a sequence of one or more 
decimal digits. The binary operations add (+>, sub <-). mul (*). div </), mod (//). and power <»*) are 
provided, as well as unary minus (-). There are binary comparison operations It (<>, le <<«>, equal 
(-), ge <>«), and gt <>). In addition, there are two operations, fromjto and fromjojrp, for iterating 
over a sequence of integers. For example, one can iterate over the odd numbers between 1 and 100 

with 

for i: Int In lnt$f rom_to_by(l. 100, 2) do ..xompute... end 

7.4 Real 

The type real models (a subset of) the mathematical real numbers. The exact subset is not 
part of the language definition, although certain constraints are imposed (see Appendix II, 
Section 4). Reals are immutable, and are written as a mantissa with an (optional) exponent. A 
mantissa is either a sequence of one or more decimal digits, or two sequences (one of which may be 
empty) joined by a period. The mantissa must contain at least one digit. An exponent is *E* or *e, 
optionally followed by V or '-', followed by one or more decimal digits. An exponent is required if 
the mantissa does not contain a period. As is usual, mLx - m*10*. Examples of real literals are: 
3.14 3.14E0 314e-2 .0314E+2 3. .14 

As with integers, the operations add <+), sub (-), mul (*). div </), mod (//), power (**). minus (-). 
It (<), le (<-), equal <-), ge <>->. and gt (>), are provided. It is important to note that there is no 
form of implicit conversion between types. So. for example, the various binary operators cannot 
have one integer and one real argument. The i2r operation converts an integer to a real, r2i 
rounds a real to an integer, and trunc truncates a real to an integer. 
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7.5 Char 

The type char provides the alphabet for text manipulation. Characters are immutable, and 
form an ordered set. Every implementation must provide at least 128, but no more than 512. 
characters; the first 128 characters are the ASCII characters in their standard order. 

Printing ASCII characters (octal 40 thru octal 176). other than single quote or backslash, can be 

written as that character enclosed in single quotes. Any character can be written by enclosing one 

of the following escape sequences in single quotes: 

escape sequence character 

V ' (single quote) 

\" " (double quote) 

\\ \ (backslash) 

\n NL (newline) 

\t HT (horizontal tab) 

\p FF (form feed, newpage) 

\b BS (backspace) 

\ r ^- CR (carriage return) 

\v / VT (vertical tab) 

\ %%% <-■' specif ied by octal value (exactly three octal digits) 

The escape sequences may be written using upper case letters. Examples of character literals are: 
7* V MH V V *\B' M77 
There are two operations. i2c and c2i, for converting between integers and characters: the 
smallest character corresponds to zero, and the characters are numbered sequentially. Binary 
comparison operations exist for characters based on this numerical ordering: It (<). U (<-), equal (->, 
ge (>-), and gt (>). 

7.6 String 

The type string is used for representing text. A string is an immutable sequence of zero or 
more characters. Strings are lexicographically ordered, based on the ordering for characters. A 
string is written as a sequence of zero or more character representations, enclosed in double quotes. 
Within a string literal, a printing ASCII character other than double quote or backslash Is 
represented by itself. Any character can be represented by using the escape sequences listed above. 
Examples of string literals are: 

"Item\tCost" "altmode <\033) - W03S" 
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The characters of a string are indexed sequentially starting from one, and there are a number 
of operations that deal with these indexes: fitch, sulstr, rest, indexc, and tndexs. The fetch 
operation is used to obtain a character by index. Invocations of fetch can be written using a special 
syntax <f utty described in Section lOAlfc 

s£U * get the character at index i of s 

Substr returns a string given a string, a starting index, and a length: 

strfngSsubstrCabcde", 2. 3) - "bed" 
Rest, given a string and a starting index, returns the rest of the string: 

•tringJrestCabcde". S> - "cde" 
Indexc computes the least index at which a character occurs hi a string, and indexs does the same 
for a string; the result is zero if the character or string docs not occur: 

atrmotindexd'd*. "abede") - 4 
strmg$indexs<"cd", "abede") - 3 
«trlng$indexs<"abcde", "cd"> - 

Two strings can be concatenated together with cmcat «>. and a single character can be 
appended to the end of a string with append. Note that »trino*concat<"abc", "de"> and 
•tring$append("abcd". V) produce the same string as writing "abede". C2s converts a character to 
a single-character string. The siie of a string can be determined with size. Chars iterates over the 
characters of a string, from the first to the last character. There are also the usual lexicographic 
comparison operations: ft <<>. le <<-), equal <->, ge <>->. and gt (>). 

7.7 Any 

A type specification is used to restrict the class of objects that a variable can denote, a 
procedure or iterator can take as arguments, a procedure can return, etc There are times when no 
restrictions are desired, when any object is acceptable. At such times, the type specification any is 
used. For example, one might wish to implement a table mapping strings to arbitrary objects, with 
the intention that different strings could map to objects of different types. The lookup operation, 
used to get the object corresponding to a string, would have its result declared to be of type any. 

The type any is the union of all possible types, and it is the only true union type in CLU; an- 
other types are base types. Every object is of type any. as well as being of some base type. The 
type any has no operations; however, the base type of an object can be tested at run-time (see 
Section 10.11). 
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7.8 Array Types 

Arrays are one-dimensional, and are mutable. Arrays are unconventional because the number 
of elements in an array can vary dynamically. Furthermore, there is no notion of an "uninitialized 
element. 

The state of an array consists of an integer called the low bound, and a sequence of objects 

called the elements. The elements of an array are indexed sequentially, starting from the low 

bound. All of the elements must be of the same type; this type is specified in the array typ« 

specification, which has the form 

array [ type_spec ] 

Examples of array type specifications are 

arr a/tint) 

arrayt arrayt string]] 

There are a number of ways to create a new array, of which only two are mentioned here. 
The create operation takes an argument specifying the low bound, and creates a new array with 
that low bound and no elements. An array constructor can be used to create an array with an 
arbitrary number of initial elements. For example, 

arrayt int] $ [5: 1. 2. 3. 4] 
creates an integer array with low bound 5, and four elements, while 

arrayt booll $ [true, false] 
creates a boolean array with low bound 1 (the default), and two elements. Array constructors are 
discussed fully in Section 10.6.1. 

An array type specification states nothing about the bounds of an array. This is because 
arrays can grow and shrink dynamically. Addh adds an additional element to the end of the array, 
with index one greater than the previous last element. Addl adds an additional element to the 
beginning of the array, and decrements the low bound by one, so that the new first element has an 
index one less than the previous first element. Remh removes the last element; reml removes the 
first element and increments the low bound. Note that all of these operations preserve the indexes 
of the other elements. Also note that these operations do not create holes; they merely add to or 
remove from the ends of the array. 

As an example, if a remh were performed on the integer array 
arrayt Int] $ [5: 1, 2. 3, 4] 
the element 4 would disappear, and the new last element would be 3, still with index 7. If a were 
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added using addl, it would become the new first element, with index 4. 

The fetch operation extracts an element by index, and the store operation replaces an element 

by index, an index is illegal if no element with that index exists. Invocations of these operations 

can be written using special forms (covered fully in Sections 10.5.1 and 11.2.1): 

a[i] X fetch the element at index i of a 

aCi] := 3 % store 3 at index i of a (not really assignment) 

The top and bottom operations return the element with the highest and lowest index, 
respectively. The high and low operations return the highest and lowest indexes, respectively. The 
elements iterator yields the elements from bottom to top, and the indexes iterator yields the indexes 
from low to high. There is also a size operation that returns the number of elements. 

Every newly created array has an identity that is distinct from all other arrays; two arrays can 
have the same elements without being the same array object. The identity of arrays can be 
distinguished with the equal (-) operation. The similarl operation tests if two arrays have the 
same state, using the equal operation of the element type. Similar tests if two arrays have similar 
states, using the similar operation of the element type. For example, writing 

ai$[3: 1, 2, 3] 
(where "ai" is equated to arrayUnti) in different places produces arrays that are similarl and 
similar (but not equal), while the following produces arrays that are similar, but not similarl (or 
equal): 

arrayCail $ [1: aiScreated)] 

7.9 Sequence Types 

Sequences are immutable arrays. Although an individual sequence can have any length, that 
length cannot vary dynamically, and the elements of the sequence cannot be replaced. The elements 
of a sequence are indexed sequentially, starting from one. A sequence type specification has the 
form 

sequence [ type_spec ] 

The new operation returns an empty sequence. A sequence constructor, which is syntactically 
similar to the array constructor, can be used to create a sequence with an arbitrary number of 
elements. Sequence constructors are discussed fully in Section 10.6.2. 
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Although a sequence, once created, cannot be changed, new sequences can be constructed from 
existing ones. Addh creates a new sequence with an additional element at the end <with index one 
greater than the last element of the old sequence). Addl creates a new sequence with an additional 
element at the beginning, with index one. so that every other element has an index one greater 
than its index in the old sequence. Remh creates a new sequence with the last element removed; 
reml creates a new sequence with the first element removed. Note that, for each of these operations, 
element objects are shared between the old and new sequences. 

The fetch operation extracts an element by index, and the replace operation creates a new 
sequence with a new element at a given index; an index is illegal if no element with that index 
exists. Invocations of the fetch operation can be written using a special form (covered fully in 

Section 10.5.1): 

qti] % fetch the element at index i of q 

The top and bottom operations return the element with the highest and lowest index. 
respectively. The size operation returns the number of elements. The elements iterator yields the 
elements from bottom to top. and the indexes iterator yields the indexes in increasing order, starting 
from one. Two sequences can be concatenated together with concat (II) to produce a new sequence, 
and subseq extracts a subsequence of a sequence. 

Two sequences with the same elements are the same sequence. The equal (-) operation tests if 
two sequences have the same elements, using the equal operation of the element type. Similar tests 
if two sequences have similar elements, using the similar operation of the element type. For 
example, writing 

sequencer, arrayf intil$[ array* intlStlll 
in different places produces sequences that are similar but not equal. 

7.10 Record Types 

A record is a mutable collection of one or more named objects. The names are called selectors, 
and the objects are called components. Different components may have different types. A record 
type specification has the form 

record [ field_spec , ... ] 
where 

field _spec ::s name . ... : type_spec 
Selectors must be unique within a specification, but the ordering and grouping of selectors is 
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unimportant. For example, all the of the following name the same type: 

recordUast. first, middle: string, age: inti 
recordCf irst. middle, last: string, age: Intl 
recorotlast: string, age: Int, first, middle: string] 

A record is created using a record constructor. For example: 

info $ (last: "Jones", first: "John", age: 32, middle: "J.") 

(assuming that "info" has been equated to one of the above type specifications; see Section 8.3). An 

expression must be given for each selector, but the order and grouping of selectors need not 

resemble the corresponding type specification. Record constructors are discussed fully in 

Section 10.6.3. 

For each selector "sel". there is an operation get_%e\ to extract the named component, and an 
operation 5rf_se1 to replace the named component with some other object. For example, there are 
get.middle and set.middle operations for the type specified above. Invocations of these operations 
can be written in a special form (discussed fully in Sections 10.5.2 and 11.2.2): 

r.middle * get the 'middle' component of r 

r age .„ 33 % set the 'age' component of r to 33 (not really assignment) 

As with arrays, every newly created record has an identity that is distinct from all other 
records; two records can have the same components without being the same record object. The 
identity of records can be distinguished with the equal <-) operation. The simllarl operation tests 
if two records have the same components, using the equal operations of the component types. 
Similar tests if two records have similar components, using the similar operations of the component 
types. 

7.11 Structure Types 

A structure is an immutable record. A structure type specification has the form 
struct t f ield_spec , ... ] 
where (as for records) 

f ield_spec ::= name , ... : type.spec 
A structure is created using a structure constructor, which syntactically is identical to a record 
constructor. Structure constructors are discussed fully in Section 10.6.4. 

For each selector "sel", there is an operation getse\ to extract the named component, and an 
operation replaced to create a new structure with the named component replaced with some other 
object. Invocations of the get operations can be written in a special form (discussed fully In 
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Section 10.5.2): 

st.seldom * get the 'seldom' component of st 

As with sequences, two structures with the same components are in fact the same object. The 
equal (=) operation tests if two structures have the same components, using the equal operations of 
the component types. Similar tests if two structures have similar components, using the similar 
operations of the component types. 

7.12 Oneof Types 

A oneof type is a tagged discriminated union. A oneof is an immutable labeled object, to be 
thought of as "one of" a set of alternatives. The label is called the tag, and the object is called the 
value. A oneof type specification has the form 

oneof [ field _spec . ... 1 
where (as for records) 

field _spec ::= name , ... : type_spec 
Tags must be unique within a specification, but the ordering and grouping of tags is unimportant. 
As an example of a oneof type, the representation type for an immutable linked list of integers, 
intjist, might be written 

oneoflempty: null, 

pair: structfcar: Int. cdr: intjist]] 

As another example, the contents of a "number container" might be specified by 

oneotTempty: null. 

integer: mi 

real_num: real, 

complex_num: complex) 

For each tag V of a oneof type, there is a makej. operation which takes an object of the type 
associated with the tag. and returns the object (as a oneof) labeled with tag "t". For example, 

number$make_real_num(1.37> 
creates a oneof object with tag "real_num" (assuming "number" has been equated to the "number 
container" type specification above; see Section 8.3). 

The equal (=) operation tests if two oneofs have the same tag. and if so. tests if the two value 
components are the same, using the equal operation of the value type. Similar tests if two oneofs 
have the same tag. and if so. tests if the two value components are similar, using the similar 
operation of the value type. 
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To determine the tag and value of a oneof object, one normally uses the tagcasa statement, 
discussed in Section 11.6. 

7.13 Variant Types 

A variant is a mutable oneof. A variant type specification has the form 
variant [ field _spec , ... ] 
where (as for records) 

field _spec ::s name . ... : type_spec 
The state of a variant is a pair consisting of a label called the tag and an object called the value. 
For each tag "t" of a variant type, there is a make_\. operation which takes an object of the type 
associated with the tag, and returns the object (as a variant) labeled with tag "t". In addition, there 
is a changed operation, which takes an existing variant and an object of the type associated with 
"t". and changes the state of the variant to be the pair consisting of the tag V and the given 
object. 

Every newly created variant has an identity that is distinct from all other variants; two 
variants can have the same state without being the same variant object. The identity of variants 
can be distinguished using the equal (=) operation. The similar! operation tests if two variants 
have the same tag, and if so, tests if the two value components are equal, using the equal operation 
of the value type. Similar tests if two variants have the same tag, and if so, tests if the two value 
components are similar, using the similar operation of the value type. 

To determine the tag and value of a variant object, one normally uses the tagcasa statement, 
as discussed in Section 11.6. 

7.14 Procedure and Iterator Types 

Procedures and iterators are objects created by the CLU system (see Section 3.1). The type 
specification for a procedure or iterator contains most of the information stated in a procedure or 
iterator heading; a procedure type specification has the form 

proctype ( [ type_spec , ... J ) [ returns J [ signals J 
and an iterator type specification has the form 

itertype ( [ type_spec ,...])[ yields J [ signals J 
where 
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returns ::= returns ( type_spec , ... ) 

yields ::= yields ( type_spec , ... ) 

signals ::= signals ( exception , ... ) 

exception ::s name [ ( type_spec ,...)] 

The first list of type specifications describes the number, types, and order of arguments. The 

returns or yields clause gives the number, types, and order of the objects to be returned or 

yielded. The signals clause lists the exceptions raised by the procedure or iterator; for each 

exception name, the number, types, and order of the objects to be returned is also given. All names 

used in a signals clause must be unique, and cannot be failure, which has a standard meaning in 

CLU (see Section 12.1). The ordering of exceptions is not important. For example, both of the 

following type specifications name the procedure type for string$substr: 

proctype (string, int, Int) returns (string) signals (bounds, negative_size) 
proctype (string, int, int) returns (string) signals (negative_size, bounds) 

StringSchars has the following iterator type: 

itertype (string) yields (char) 

Procedure and iterator types have an equal (=) operation. Invocation is not an operation, but a 

primitive action of CLU semantics (see Section 9.3). 

7.15 Other Type Specifications 

The type specification for a user-defined type has the form 
idn [ [ constant ,...]] 
where each constant must be computable at compile-time (see Section 8.3). The identifier must be 
bound to a data abstraction (see Section 4). If the referenced abstraction is parameterized, 
constants of the appropriate types and number must be supplied. The order of parameters always 
matters in user-defined types. 

There are three special type specifications that are used when implementing new abstractions: 
rep. cvt. and type. These forms are discussed in Sections 13.3 and 13.4. Within an 
implementation of an abstraction, formal parameters declared with type can be used as type 
specifications. 

In addition, identifiers which have been equated to type specifications can also be used as type 
specifications. Equates are discussed in Section 8.3. 
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8. Scopes, Declarations, and Equates 

We now describe how to introduce and use constants and variables, and the scope of constant 
and variable names. Scoping units are described first, followed by a discussion of variables, and 
finally constants. 

8.1 Scoping Units 

Scoping units follow the nesting structure of statements. Generally, a scoping unit is a body 
and an associated "heading". The scoping units are (refer also to Appendix I): 

1. From the start of a module to its end. 

2. From a cluster, proc, or iter to the matching end. 

3. From a for, do, or begin to the matching end. 

4. From a then or else in an If statement to the end of the corresponding 
body. 

5. From a tag or others in a tagcase statement to the end of the 
corresponding body. 

6. From a when or others in an except statement to the end of the 
corresponding body. 

7. From the start of a typeset to its end. 

The last case above, the scope in a type_set. is a special case that will be discussed in Section 13.4. 
Whatever we say about scopes in the remainder of this section refers only to cases 1 through 6. 

The structure of scoping units is such that if one scoping unit overlaps another scoping unit 
(textually), then one is fully contained in the other. The contained scope is called a nested scope, 
and the containing scope is called a surrounding scope. 

New constant and variable names may be introduced in a scoping unit. Names for constants 
are introduced by equates, which are syntactically restricted to appear grouped together at or near 
the beginning of scoping units. For example, equates may appear at the beginning of a body, but 
not after any statements in the body. 

In contrast, declarations, which introduce new variables, are allowed wherever statements are 
allowed, and hence may appear throughout a scoping unit. Equates and declarations are discussed 
in more detail in the following two sections. 
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In the syntax there are two distinct nonterminals for identifiers: idn and name. Any identifier 
introduced by an equate or declaration is an idn, as is the name of the module being defined, and 
any operations it has. An idn names a specific type or object. The other kind of identifier is a 
name. A name is used to refer to a subpiece of something, and is always used in context; for 
example, names are used as record selectors. The scope rules apply only to idns. 

The scope rules are very simple: 

1. An idn may not be redefined in its scope. 

2. Any idn that is used as an external reference in a module may not be 
used for any other purpose in that module. 

Unlike other "block-structured" languages, CLU prohibits the redefinition of an identifier in a 

nested scope. An identifier used as an external reference names a module or constant; the reference 

is resolved using the compilation environment (see Section 4). 

8.2 Variables 

Objects are the fundamental "things" in the CLU universe; variables are a mechanism for 
denoting (i.e., naming) objects. This underlying model is discussed in detail in Section 3. A 
variable has two properties: its type, and the object that it currently denotes (if any). A variable is 
said to be uninitialized if it does not denote any object. 

There are only three things that can be done with variables: 

1. New variables can be introduced. Declarations perform this function, 
and are described below. 

2. An object may be assigned to a variable. After an assignment the 
variable denotes the object assigned. Assignment is discussed in 
Section 9.2. 

3. A variable may be used as an expression. The value of such an 
expression (i.e.. the result of evaluating it) is the object that the 
variable denotes at the time the expression is evaluated. Expressions 
and their evaluation are described in Section 10. 

8.2.1 Declarations 

Declarations introduce new variables. The scope of a variable is from its declaration to the 
end of the smallest scoping unit containing its declaration; hence, variables must be declared before 
use. 
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There are two sorts of declarations: those with initialization, and those without. Simple 
declarations (those without initialization) take the form 

decl ::= id n .... : type_spec 
A simple declaration introduces a list of variables, all having the type given by the type_spec. This 
type determines the types of objects that can be assigned to the variable. Some examples of simple 
declarations are: 

i: int X declare i to be an integer variable 

i, j, k: char X declare i, j, and k to be character variables 

x, y: complex X declare x and y to be of type complex 

z: any X declare z to be of type any; thus, z may denote any object 

The variables introduced in a simple declaration initially denote no objects, i.e., they are 

uninitialized. Attempts to use uninitialized variables (if not detected at compile-time) cause the 

run-time exception 

failureCuninitialized variable") 

(Exceptions are discussed in Section 12.) 

8.2.2 Declarations with Initialization 

A declaration with initialization combines declarations and assignments into a single statement. 
A declaration with initialization is entirely equivalent to one or more simple declarations followed 
by an assignment statement. The two forms of declaration with initialization are: 

idn : type_spec :- expression 
and 

declj dec1 n : " invocation 

These are equivalent to (respectively): 

idn : type_spec 
idn :- expression 

and 

declj ... decl n X declaring idn| ... idn m 

idnj '^ n m : " mvoc ation 

In the second form, the order of the idns in the assignment statement is the same as in the original 
declaration with initialization. (The invocation must return m objects; see Section 9.2.2.) 
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Some examples of declarations with initialization are: 

astr: arraytstrlng] := array[string]$create<l) 

% declare astr to be an array variable and initialize it to an empty array 

first, last: string, balance: int := acct$query(acct_no) 

X declare first and last to be string variables, balance an integer variable, 
X and initialize them to the results of a bank account query 

The above two statements are equivalent to the following sequences of statements: 

astr: arraytstrlng] 

astr :« arraytstrlng]$create(l) 

first, last: string 

balance: int 

first, last, balance := acct$query(acct_no) 

8.3 Equates and Constants 

An equate allows a single identifier to be used as an abbreviation for a constant that may have 
a lengthy textual representation. We use the term constant in a very narrow sense here: constants, 
in addition to being immutable, must be computable at compile-time. Constants are either types 
(built-in or user-defined), or objects that are the results of evaluating constant expressions. 
(Constant expressions are defined below.) 

The syntax of equates is: 

equate ::= idn - constant 
| idn = type_set 

constant ::s type_spec 
| expression 

This section describes only the first form of equate; discussion of type_sets is deferred to 
Section 13.4. 

An equated identifier may be used as an expression. The value of such an expression is the 
constant to which the identifier is equated. An equated identifier may not be used as the target of 
an assignment. 

The scope of an equated identifier is the smallest scoping unit surrounding the equate defining 
it; here we mean the entire scoping unit, not just the portion after the equate. All the equates in a 
scoping unit must appear near the beginning of the scoping unit. The exact placement of equates 
depends on the containing syntactic construct; usually equates appear at the beginnings of bodies. 
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Equates may be in any order within the group. Thus, forward references among equates in 
the same scoping unit are allowed, but cyclic dependencies are illegal. For example, 
x - y 

y = * 

z -3 

is a legal sequence of equates, but 

x =y 
y-z 

z - x 

is not. Since equates introduce idns, the scoping restrictions on idns apply <i.e., the idns may not be 
defined more than once). 

8.3.1 Abbreviations for Types 

Identifiers may be equated to type specifications, thus giving abbreviations for type names. 
For example: 

at - array! Intl 

ot - oneoff. there: rt, none: null] 

rt » recorota: foo, b: bar] 

pt - proctype (int. Int) returns (Int) signals (overflow) 

it - itertype (int, Int int) yields (int) signals (bounds) 

istack - stackUnt] 

mt - mark_table 

Notice that since equates may not have cyclic dependencies, directly recursive type specifications 
cannot be written. However, this does not prevent the definition of recursive types: clusters allow 
them to be written (see Section 13). 

8.3.2 Constant Expressions 

Here we define the subset of objects that equated identifiers may denote, by stating which 
expressions are constant expressions. (Expressions are discussed in detail in Section 10.) A constant 
■ expression is an expression that can be evaluated at compile-time to produce an immutable object 
of a built-in type. Specifically this includes: 

1. Literals. 

2. Identifiers equated to constants. 
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3. Procedure and iterator names (see Section 10.3), including forceCf] for 
any type t. 

4. Invocations of procedure operations of the built-in constant types, 
provided that all operands and all results are constant expressions. 
However, we explicitly forbid the use of formal parameters as operands 
to invocations in constant expressions, since the values of formal 
parameters are not known at compile-time. 

5. Formal parameters (see Section 13.4). 

For completeness, the list of the built-in constant types is: null, int. real, bool, char, string, 

sequence types, oneof types, structure types, procedure types, and iterator types. 

Some examples of equates involving expressions are: 

hashjmodulus - 29 

pi •= 3.14159265 

win = true 

controLc = *\003' 

prompt_string = "Input: " 

nl - $tring$c2s('\n') 

prompt = nl II prompt_string 

promptjen - string$size(prompt) 

quarter = pi / 2.0 

ftb = lnt$from_to_by 

ot = oneof [cell: cell, none: null] 

cell = recordtfirst, second: int] 

nilptr - ot$make_none(nll) 

Note that the following equate is illegal because it uses a record constructor, which is not a constant 

expression: 

cetl_l_2 = ot$make_cell(cell${first: 1, second: 21) 

Any invocation in a constant expression must terminate normally; a program is illegal if 

evaluation of any constant expression would signal an exception. (Exceptions are discussed in 

Section 12.) Illegal programs will not be executed. 



9. Assignment and Invocation 

Two fundamental actions of CLU are assignment of computed objects to variables, and 
invocation of procedures (and iterators) to compute objects. Other actions are composed from these 
two by using various control flow mechanisms. Since the correctness of assignments and 
invocations depends on a type-checking rule, we describe that rule first, then assignment, and 
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finally invocation. 

9.1 Type Inclusion 

CLU is designed to allow compile-time type-checking. The type of each variable is known by 
the compiler. Furthermore, the type of objects that could result from the evaluation of any 
expression (invocation) is known at compile-time. Hence, every assignment can be checked at 
compile-time to make sure that the variable is only assigned objects of its declared type. The rule 
is that an assignment v :- E is legal only if the set of objects defined by the type of £ (loosely, the 
set of all objects that could possibly result from evaluating the expression) is included in the set of 
all objects that could be denoted by v. 

Instead of speaking of the set of objects defined by a type, we generally speak of the type and 
say that the type of the expression must be included in the type of the variable. If it were not for 
the type any. the inclusion rule would be an equality rule. This leads to a simple interpretation of 
the type inclusion rule: 

The type of a variable being assigned an expression must be either the type of the 
expression, or any. 

9.2 Assignment 

Assignment is the means of causing a variable to denote an object. Some assignments are 
implicit, i.e.. performed as part of the execution of various mechanisms of the language (most 
notably procedure invocation, iterator invocation, exception handling, and the tagcase statement). 
All assignments, whether implicit or explicit, are subject to the type inclusion rule. The remainder 
of this section discusses explicit assignments. 

The assignment symbol ":-" is used in two other syntactic forms that are not true assignments, 
but rather abbreviations for certain invocations. These forms are used for updating collections 
such as records and arrays (see Section U.2). 

9.2.1 Simple Assignment 

The simplest form of assignment is: 
idn :» expression 
In this case the expression is evaluated, and the resulting object is assigned to the variable. The 
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expression must return a single object (whose type must be included in that of the variable). 
Examples of simple assignments are: 

x : = 1 X x's type must include int, i.e., it must be int or any 

y := string$substr(s, 5, n) X y's type must include string 

a := array! intl$new() X a's type must include array! inti 

p :«= arraytint]$create(3) X p's type must include arraytint] 

z := (foo - bar) X z's type must include boot 

It is also possible to declare a variable and assign to it in a single statement; this is called a 
declaration with initialization, and was discussed in Section 8.2.2. 

9.2.2 Multiple Assignment 

There are two forms of assignment that assign to more than one variable at once: 
idn , ... :« expression . ... 
and 

idn , ... :« invocation 

The first form of multiple assignment is a generalization of the simple assignment. The first 
variable is assigned the first expression, the second variable the second expression, and so on. The 
expressions are all evaluated (from left to right) before any assignments are performed. The 
number of variables in the list must equal the number of expressions, no variable may occur more 
than once, and the type of each variable must include the type of the corresponding expression. 

This form of multiple assignment allows easy permutation of the objects denoted by several 

variables: 

x. y :- y, x 
i, j, k :- j. k, i 

and similar simultaneous assignments of variables that would otherwise require temporary 

variables: 

a, b :» (a + b), (a - b) 

quotient, remainder := (u / v). (u // v) 

There is no form of this statement with declarations. 

The second form of multiple assignment allows one to retain the objects resulting from an 

invocation returning two or more objects. The first variable is assigned the first object, the second 

variable the second object, and so on. The order of the objects is the same as in the return 

statement of the invoked routine. The number of variables must equal the number of objects 

returned, no variable may occur more than once, and the type of each variable must include the 
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corresponding return type of the invoked procedure. Note that the right-hand side is syntactically 
restricted to simple invocations (see Section I0.4>; sugared invocations (see Sections 103. 10.7) are not 

anowed. 

Two examples of this form of assignment are: 

first, last, balance :- acct$query(acct_no) 
x. y, z :- vector$components<v) 

9.3 Invocation 

Invocation is the other fundamental action of CLU. In this section we discuss procedure 
invocation; iterator invocation is discussed in Section 1135. However, up to and including pawing 
of arguments, the two are the same. 

Invocations take the form: 

primary ( [ expression ,..-]) 
A primary is a slightly restricted form of expression, which includes variables and routine names. 

among other things. (See the next section.) 

The sequence of activities hi performing an invocation are as follows: 

1. The primary is evaluated. It must evaluate to a procedure or iterator. 

2. The expressions are evaluated, from left to right 

3. New variables are introduced corresponding to the formal arguments 
of the routine being invoked (U* a new environment is created for the 
invoked routine to execute in). 

4. The objects resulting from evaluating the expressions (the actual 
arguments) are assigned to the corresponding new variables (the formal 
arguments). The first formal is assigned the first actual the second 
formal the second actual, and so on. The type of each expression must 
be included in the type of the corresponding formal argument 

5. Control is transferred to the routine at the start of its body. 

An invocation is considered legal in exactly those situations where all the (implicit) assignments 
involved in its execution are legal. 

It is permissible for a routine to assign an object to a formal argument variable; the effect is 
just as if that object were assigned to any other variable. From the point of view of the invoked 
routine, the only difference between its formal argument variables and to other local variables is 
that the f ormab are initialized by its caller. 
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Procedures can terminate in two ways: they can terminate normally, returning zero or more 
objects, or they can terminate exceptionally, signalling an exceptional condition. When a procedure 
terminates normally, the result objects become available to the caller, and will (usually) be assigned 
to variables or passed as arguments to other routines. When a procedure terminates exceptionally, 
the flow of control will not go to the point of return of the invocation, but rather will go elsewhere 
as described in Section 12. 

Some examples of invocations are: 

p<) X invoking a procedure taking no arguments 

arrayUnt]$create(-l) % invoking an operation of a type 

routinejableUndexHinput) % invoking a procedure fetched from an array 



10. Expressions 

An expression evaluates to an object in the CLU universe. This object is said to be the result 
or value of the expression. Expressions are used to name the object to which they evaluate. The 
simplest forms of expressions are literals, variables, and routine names. These forms directly name 
their result object. More complex expressions are generally built up out of nested procedure 
invocations. The result of such an expression is the value returned by the outermost invocation. 

Like many other languages, CLU has prefix and infix operators for the common arithmetic 
and comparison operations, and uses the familiar syntax for array indexing and record component 
selection (e.g., aW and r.s). However, in CLU these notations are considered to be abbreviations 
for procedure calls. This allows built-in types and user-defined types to be treated as uniformly as 
possible, and also allows the programmer to use familiar notation when appropriate. 

In addition to invocation, four other forms are used to build complex expressions out of 
simpler ones. These are the conditional operators cand and cor (see Section 10.8), and the type 
conversion operations up and down (see Section 10.10). 

There is a syntactically restricted form of expression called a primary. A primary is any 
expression that does not have a prefix or infix operator, or parentheses, at the top level. In certain 
places, the syntax requires a primary rather than a general expression. This has been done to 
increase the readability of the resulting programs. 
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As a general rule, procedures with side effects should not be used in expressions, and programs 
should not depend on the order in which expressions are evaluated. However, to avoid surprises, 
the subexpressions of any expression are evaluated from left to right. 

The various forms of expressions are explained below. 



10.1 Literals 

Integer, real, character, string, boolean and null literals are expressions. The syntax for literals 
is given in Sections 7.1 to 7.6. The type of a literal expression is the type of the object named by 
the literal. For example, true is of type bool, "abc" is of type string, etc. 

10.2 Variables 

Variables are identifiers that name objects of a given type. The type of a variable is the type 
given in the declaration of that variable, and determines which objects may be named by the 
variable. 

10.3 Procedure and Iterator Names 

Procedures and iterators may be defined either as separate modules, or within a cluster. Those 
defined as separate modules are named by expressions of the form: 

idn [ [ constant ,...]] 
The optional constants are the parameters of the procedure or iterator abstraction. (Constants were 

discussed in Section 8.3.) 

When a procedure or iterator is defined as an operation of a type, that type must be part of 
the name of the routine. The form for naming an operation of a type is: 
type_spec $ name [ [ constant , ... 1 J 

The type of a procedure or iterator name is just the type of the named routine. Some 

examples of procedure and iterator names are: 

primes 

sorttint] 

inttadd 

arrayt booflSelements 
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10.4 Procedure Invocations 

Procedure invocations have the form 
primary ( [ expression ,...]) 
The primary is evaluated to obtain a procedure object, and then the expressions are evaluated left- 
to-right to obtain the argument objects. The procedure is invoked with these arguments, and the 
object returned is the result of the entire expression. For more discussion see Section 9.3. 

The following expressions are invocations: 

p(x) 

int$add(a, b) 
withint3.2K7.1, .003e7) 

Any procedure invocation P(Ej, ... E n ) must satisfy two constraints: the type of P must be of 
the form 

proctype (Tj. ... T n ) returns (R) signals (...) 
and the type of each expression Ej must be included in the corresponding type Tj. The type of the 
entire invocation expression is given by R. 

Procedures can also be invoked as statements (see Section 11.1). 

10.5 Selection Operations 

Arrays, sequences, records, and structures are collections of objects. Selection operations 
provide access to the individual elements or components of the collection. Simple notations are 
provided for invoking the/etch and store operations of array types, the fetch operation of sequence 
types, the get and set operations of record types, and the get operations of structure types. In 
addition, these "syntactic sugarings" for selection operations may be used for user-defined types 
with the appropriate properties. 

10.5.1 Element Selection 

An element selection expression has the form: 
primary t expression ] 
This form is just syntactic sugar for an invocation of a fetch operation, and is completely 
equivalent to: 

T$fetch(primary, expression) 
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where T is the type of primary. For example, if a is an array of integers, then 

af.27] 
is completely equivalent to the invocation 

array! Intl$fetch(a, 27) 
When primary is an arraytSl or sequence^] for some type S. expression must be an int and 
the result has type S. However, the element selection expression is not restricted to arrays and 
sequences. The expression is legal whenever the corresponding invocation is legal. In other words, 
T (the type of primary) must provide a procedure operation named fetch, which takes two 
arguments whose types include the types of primary and expression, and which returns a single 
result. 

The use of fetch for user-defined types should be restricted to types with array-like behavior. 
Objects of such types will contain (along with other information) a collection of objects, where the 
collection can be indexed in some way. For example, it might make sense for an 
associative_memory type to provide a fetch operation to access the value associated with a key. 
Fetch operations are intended for use in expressions; thus they should never have side-effects. 
Array-like types may also provide a store operation (see Section 11.2.1). 

10.5.2 Component Selection 

The component selection expression has the form: 

primary . name 
This form is just syntactic sugar for an invocation of a getjname operation, and is completely 
equivalent to: 

T$get_nam«(primary) 
where T is the type of primary. For example, if x has type recordtfirst: Int, second: reaO, then 

x. first 
is completely equivalent to 

recordC first: int second: reall$get_first(x) 
When T is a record or structure type, then T must have a selector called name, and the type of 
the result will be the type of the component named by that selector. However, the component 
selection expression is not restricted to records and structures. The statement is legal whenever the 
corresponding invocation is legal. In other words, T (the type of primary) must provide a 
procedure operation named get_name, which takes one argument whose type includes the type of 
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primary, and which returns a single result. 

The use of get operations for user-defined types should be restricted to types with record-like 
behavior. Objects of such types will contain (along with other information) one or more named 
objects. For example, it might make sense for a file type to provide a get_author operation, which 
returns the name of a file's creator. Get operations are intended for use in expressions; thus they 
should never have side-effects. 

Types with named components may also provide set operations (see Section 11.2.2). 

10.6 Constructors 

Constructors are expressions that enable users to create and initialize arrays, sequences, records, 
and structures. Constructors are not provided for user-defined types. 

10.6.1 Array Constructors 

An array constructor has the form: 

type.spec $ [ [ expression: J [ expression .... J J 
The type specification must name an array type: arraytT]. This is the type of the constructed 
array. The expression preceding the ":" must evaluate to an integer, and becomes the low bound of 
the constructed array. If this expression is omitted, the low bound is 1. The expressions following 
the ":" are evaluated to obtain the elements of the array. They correspond (left to right) to the 
indexes lowjbound, low_bound+\, lowJbound*2, ... For an array of type arrayCT], the type of each 
element expression in the constructor must be included in T. 

For example, the expression 

arrayCbooll $ [79: true, false] 
constructs a new boolean array with two elements: true (at index 79), and false (at index 80). The 
expression 

arraytai] $ [ai$[], ai$[]l 
(where ai is equated to arrayfintl) creates two distinct integer arrays, both empty, and creates a 
third array to hold them. The low bound of each array is 1. 

An array constructor is computationally equivalent to an array create operation, followed by a 
number of array addh operations. However, such a sequence of operations cannot be written as an 
expression. 
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10.6.2 Sequence Constructors 

A sequence constructor has the form: 
type-Spec $ t [ expression ,..»]] 
The type specification must name a sequence type: sequenced. This is the type of the 
constructed sequence. The expressions are evaluated to obtain the elements of the sequence. They 
correspond (left to right) to the indexes 1, 2, 3, ... For a sequence of type scqucncem, the type of 
each element expression in the constructor must be included in T. 

A sequence constructor is computationally equivalent to a sequence turn operation, followed by a 
number of sequence addh operations. 

10.6.3 Record Constructors 

A record constructor has the form: 

type_spec $ { field , ... ) 
where 

field ::s name , ... : expression 
Whenever a field has more than one name, it is equivalent to a sequence of fields, one for each 
name. Thus, the following two constructors are equivalent: 

R - record a: bit. b: mi, c int 1 
R$(a, b: 7, c: 9) 
R${a: 7, b: 7, c: 9) 

In a record constructor, the type specification must name a record type: 

record [S^Tj S n :T„l. This will be the type of the constructed record. The component names 

in the field list must be exactly the names Sj S„. although these names may appear in any 

order. The expressions are evaluated left to right, and there is one evaluation per component 
name even if several component names are grouped with the same expression. The type of the 
expression for component S, must be included in T,. The results of these evaluations form the 
components of a newly constructed record. This record is the value of the entire constructor 

expression. 

As an example, consider the following record constructor: 

AS * srrayf. string] 

RT - rccorottistl, Iist2: AS. item: Hit] 

RT${item:2, Hstl, list* AS$["Susan". "George", "Jan"]} 
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This produces a record that contains an integer and two distinct (but similarl) arrays. The arrays 
are distinct because the array constructor expression is evaluated twice, once for listl and once for 
Ust2. 

A record constructor is computationally equivalent to a record create operation (see 
Appendix II), but that operation is not available to the user. 

10.6.4 Structure Constructors 

A structure constructor has the form: 
type_spec $ { field , ... ) 
where (as for records) 

field ::= name , ... : expression 
Whenever a field has more than one name, it is equivalent to a sequence of fields, one for each 
name. 

In a structure constructor, the type specification must name a structure type: 

struct CSj:Tj s n :T n'- Tms w '" be the tv P e of thc constructed structure. The component 

names in the field list must be exactly the names Sj S n , although these names may appear in 

any order. The expressions are evaluated left to right, and there is one evaluation per component 
name even if several component names are grouped with the same expression. The type of the 
expression for component Sj must be included in Tj. The results of these evaluations form the 
components of a newly constructed structure. This structure is the value of the entire constructor 
expression. 

A structure constructor is computationally equivalent to a structure create operation (see 
Appendix II), but that operation is not available to the user. 

10.7 Prefix and Infix Operators 

CLU allows infix and prefix notation to be used as a shorthand for the following operations. 
The table shows the shorthand form and the equivalent expanded form for each operation. For 
each operation, the type T is the type of the first operand. 
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Shorthand form Expansion 

exprj •* expr2 Ttpowertexprj, expr^ 

expr| // expr2 T$mod(expr|, expr^ 

expr| / expr2 T$div<expr|, expr^ 

expr| * exp^ TSmuKexprj. expr^ 

exprj N exp^ T$concat(expr|, expr2> 

expr| ♦ expr2 T$add(expr|, expr^ 

expr| - expr2 TtsuWexprj, expr^ 

exprj < expr2 TStt(expr|, exp^ 

expr| <- expr2 T$le(expr|, expr^ 

exprj - expr2 TtequaKexprj, expr^ 

expr| >- expr2 T$ge(expr|, expr^ 

expr| > expr2 T$gt<expr|, expr^ 

expr| ~< expr2 ~ <expi"i < expr^ 

expr| <v<- expr2 ~ (exprj <• expr^ 

exprj -v- expr2 ~ <ex P r i ■ expr^ 

exprj ~>- expr2 ~ (ex P r i >_ wpr^ 

exprj ~> expr2 ~ <exprj > exp^ 

exprj tc expr2 T$and<exprj. expr^ 

exprj I expr2 TSoHexprj, expr^ 

- expr T$minus<expr) 

•vexpr TSnorXexpr) 

Operator notation is used most heavily for the built-in types, but may be used for 
user-defined types as well. When these operations are provided for user-defined types, they 
should always be side-effect free, and they should mean roughly the same thing as they do for the 
built-in types. For example, the comparison operations should only be used for types that have a 
natural partial or total order. Usually, the comparison operations <ft, U, quel, ge, gf> wiH be of type 

proctyp* (T, T) return* (booD 
the other binary operations <e.g„ add, sub) will be of type 

proctyp* (T, T) returns <T> signals (.J 
and the unary operations win be of type 

proctyp* <T) returns (T) signals O 
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10.8 Cand and Cor 

Two additional binary operators are provided. These are the conditional and operator, cand, 
and the conditional or operator, cor. 

expression} cand expression2 
is the boolean and of expression and expression^ However, if expressionj is falsa, expresskw^ is 
never evaluated. 

expressionj cor expressUn^ 
is the boolean or of expressionj and expression^ but expression2 is not evaluated unless 
expressionj is false. For both cand and cor, expressionj and expression2 must have type boot. 

Conditional expressions can be used to avoid run-time errors. For example, the following 

boolean expressions can be used without fear of "bounds" or "zero.divide" errors: 

(low_bound <= i) cand (i <- high_bound) cand (AU1 ~- 0) 
<n - 0) cor (1000//n « 0) 

Because of the conditional expression evaluation involved, uses of cand and cor are not 
equivalent to any procedure invocation. 

10 .9 Precedence 

When an expression is not fully parenthesized, the proper nesting of subexpressions might be 
ambiguous. The following precedence rules are used to resolve such ambiguity. The precedence 
of each infix operator is given in the table below. Higher precedence operations are performed 
first. Prefix operators always have precedence over infix operators. 

The precedence for infix operators is as follows: 

Precedence Operators 
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cor 
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The order of evaluation for operators of the same precedence is left to right, except for **, 

which is right to left. 

The following examples illustrate the precedence rules. 

Expression Equivalent Form 

a ♦ b // c a ♦ (b // c> 

a + b-c (a ♦ b) - c 

a + b ** c ** d a ♦ <b ♦* (c ** d» 

a-blc-d (a-b)l(c-d) 

- a • b <-»> * b 

10.10 Up and Down 

There are no implicit type conversions in CLU. Two forms of expression exist for explicit 
conversions. These are: 

up ( expression ) 
down < expression ) 

Up and down may be used only within the body of a cluster operation. Up changes the type 
of the expression from the representation type of the cluster to the abstract type. Down converts 
the type of the expression from the abstract type to the representation type. These conversions will 
be explained further in Section 13.3. 

10.11 Force 

CLU has a single built-in procedure generator called force. Force takes one type parameter, 
and is written 

force [ type_spec ] 
The procedure forcetTl has type 

proctype (any) returns (T> signals (wrongjype) 
If forcelT] is applied to an object that is included in type T, then it returns that object. If 
forcelT] is applied to an object that is not in type T, then it signals "wrongjype" (see Section 12). 
Force is a necessary companion to the type any. The type any allows programs to pass 
around objects of arbitrary type. However, to do anything substantive with an object, one must 
use the primitive operations of that object's type. This raises a conflict with compile-time 
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type-checking, since an operation can be applied only when the arguments are known to be of the 

correct types. This conflict is resolved by using force. Forced"] allows a program to check, at 

run-time, that a particular object is actually of type T. If this check succeeds, then the object can 

be used in all the ways appropriate for objects of type T. 

For example, the procedure forcefTJ allows us to legally write the following code: 

x: any :- 3 

y: Int :- force! IntKx) 

while the following is illegal: 

x: any :« 3 
y: Int :- x 

because the type of y (Int) does not include the type of the expression x (any). 



11. Statements 

In this section, we describe most of the statements of CLU. We omit discussion of the signal, 
exit, and except statements, which are used for signalling and handling exceptions, as described 
in Section 12. 

CLU is a statement-oriented language, i.e.. statements are executed for their side-effects and 
do not return any values. Most statements are control statements that permit the programmer to 
define how control flows through the program. The real work is done by the simple statements: 
assignment and invocation. Assignment has already been discussed in Section 9; the invocation 
statement is discussed in Section 11.1 below. Two special statements that look like assignments but 
are really invocations are discussed in Section 11.2. 

The syntax of CLU is defined to permit a control statement to control a group of equates, 
declarations, and statements rather than just a single statement. Such a group is called a body, and 
has the form 

body ::s { equate } 

{ statement } % statements include declarations 

Scope rules for bodies were discussed in Section 8.1. No special terminator is needed to signify the 
end of a body; reserved words used in the various compound statements serve to delimit the bodies. 
Occasionally it is necessary to explicitly indicate that a group of statements should be treated like a 
single statement; this is done by the block statement, discussed in Section 11.3. 
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The conditional statement is discussed in Section 11.4. Loop statements are discussed in 
Section 11.5, as are some special statements that control termination of a single iteration or a single 
loop. The tagcase statement is discussed in Section 11.6. Finally, the return statement is 
discussed in Section 11.7, and the yield statement in Section 11.8. 

11.1 Procedure Invocation 

An invocation statement invokes a procedure. Its form is the same as an Invocation 
expression: 

primary < [ expression ,...]) 
The primary must evaluate to a procedure object, and the type of each expression must be included 
in the type of the corresponding formal argument for that procedure. The procedure may or may 
not return results; if it does return results, they are discarded. 

For example, the statement 
arrayf int]$remh(a) 
will remove the top element of a (assuming a is an arrayf. int]). Remh also returns the top element, 
but it is discarded in this case. 

11.2 Update Statements 

Two special statements are provided for updating components of records and arrays. In 
addition they may be used with user-defined types with the appropriate properties. These 
statements resemble assignments syntactically, but are really invocations. 

11.2.1 Element Update 

The element update statement has the form 

primary t expression] ] :« expresskw^ 
This form is merely syntactic sugar for an invocation of a store operation, and is completely 
equivalent to the invocation statement 

T$store<primary, expression], expressior^) 
where T is the type of primary. For example, if a is an array of integers, 

at27] :- 3 
is completely equivalent to the invocation statement 
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array! int]$store(a, 27, 3) 

The element update statement is not restricted to arrays. The statement is legal if the 
corresponding invocation statement is legal. In other words, T (the type of primary) must provide 
a procedure operation named store, which- takes three arguments whose types include those of 
primary, expressioTi2, and expression^ respectively. In case primary is an arraytS) for some type S, 
expressiorij must be an integer, and expressibU2 must be included in S. 

We recommend that the use of store for user-defined types be restricted to types with 
array-like behavior, i.e., types whose objects contain mutable collections of indexable elements. For 
example, it might make sense for an associativejnemory type to provide a store operation for 
changing the value associated with a key. Such types may also provide a fetch operation (see 
Section 10.5.1). 

11.2.2 Component Update 

The component update statement has the form 

primary . name :<■ expression 
This form is merely syntactic sugar for an invocation of a setjname operation, and is completely 
equivalent to the invocation statement 

T$set_7«am*(primary, expression) 
where T is the type of primary. For example, if x has type recorolfirst: Int, second: real], then 

x.first := 6 
is completely equivalent to 

recordt first: int second: real]$set_first(x, 6) 
The component update statement is not restricted to records. The statement is legal if the 
corresponding invocation statement is legal. In other words, T (the type of primary) must provide 
a procedure operation called set_name, which takes two arguments whose types include the types of 
primary and expression, respectively. When T is a record type, then T must have a selector called 
name, and the type of expression must be included in the type of the component named by that 
selector. 

We recommend that set operations be provided for user-defined types only if record-like 
behavior is desired, i.e., it is meaningful to permit some parts of the abstract object to be modified 
by selector name. In general, set operations should not perform any substantial computation, except 
possibly checking that the arguments satisfy certain constraints. For example, in a bank account 
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type, there might be a set_min_balance operation to set what the minimum balance in the account 
must be. However, deposit and withdraw operations make more sense than a setjbalance operation, 
even though the set_balance operation could compute the amount deposited or withdrawn and 
enforce semantic constraints. 

In our experience, types with set operations occur less frequently than types with get operations 
(see Section 10.5.2). 

11.3 Block Statement 

The block statement permits a sequence of statements to be grouped together into a single 
statement. Its form is 

begin body end 
Since the syntax already permits bodies inside control statements, the main use of the block 
statement is to group statements together for use with the except statement; see Section 12. 

11.4 Conditional Statement 

The form of the conditional statement is 
if expression then body 

{ elseif expression then body } 

[ else body J 
end - 

The expressions must be of type bool. They are evaluated successively until one is found to be 

true. The body corresponding to the first true expression is executed, and the execution of the If 

statement then terminates. If none of the expressions is true, then the body in the else clause is 

executed (if the else clause exists). The elseif form provides a convenient way to write a 

multi-way branch. 

11.5 Loop Statements 

There are two forms of loop statements: the while statement and the for statement. Also 
provided are a continue statement, to terminate the current cycle of a loop, and a break statement, 
to terminate the innermost loop. These are discussed below. 
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11.5.1 While Statement 

The while statement has the form: 
while expression do body end 
Its effect is to repeatedly execute the body as long as the expression remains true. The expression 
must be of type bool. If the value of the expression is true, the body is executed, and then the 
entire while statement is executed again. When the expression evaluates to false, execution of the 
while statement terminates. 

11.5.2 For Statement 

The only way an iterator (see Section 13.2) can be invoked is by use of a for statement. The 
iterator produces a sequence of items (where an item is a group of rero or more objects) one item at 
a time; the body of the for statement is executed for each item in the sequence. 

The for statement has the form: 

for [ idn , ... ] in invocation do body end 

or 

for [ dec! , ... ] in invocation do body end 
The invocation must be an iterator invocation. The idn form uses previously declared variables to 
serve as the loop variables, while the decl form introduces new variables, local to the for statement, 
for this purpose. In either case, the type of each variable must include the corresponding yield 
type of the invoked iterator. 

Execution of the for statement proceeds as follows. First the iterator is invoked, and it either 
yields an item or terminates. If the iterator yields an item, its execution is temporarily suspended, 
the objects in the item are assigned to the loop variables, the body of the for statement is executed, 
and then execution of the iterator is resumed (from the point of suspension). Whenever the 
iterator terminates, the entire for statement terminates. 

An example of a for statement is 
a: arrayfintl 

sum: int := 

for x: int in arrayt int]$elements(a) do 

sum :- sum + x 

end 

which will compute the sum of all the integers in an array of integers. This example makes use of 
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the dementi iterator on arrays, which yields the elements of the array one by one. 

11.5.3 Continue Statement 

The continue statement has the form 
continue 
Its effect is to terminate execution of the body of the smallest loop statement in which it appears, 
and to start the next cycle of that loop (if any). 

11.5.4 Break Statement 

The break statement has the form 
break 
Its effect is to terminate execution of the smallest loop statement in which It appears. Execution 
continues with the statement following that loop. 

For example, 

sum: int :- 

for x: int in array(lnt3$elements(a) do 

sum :- sum ♦ x 

if sum >- 100 

then sum :- 100 break end 

end 

computes the minimum of 100 and the sum of the integers in a. Note that execution of the break 
statement will terminate both the iterator and the for loop, continuing with the statement following 
the for loop. 

11.6 Tagoase Statement 

The tagcase statement is a special statement provided for decomposing oneof and variant 
objects. Recall that a oneof or variant type is a discriminated union, and each object contains a tag 
and some other object called the value (see Sections 7.12 and 7.13). The tagcase statement permits 
the selection of a body to perform based on the tag of the object. 

The form of the tagcase statement is 
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tagcase expression 
tag_arm { tag_arm } 

[ others : body J 



end 



where 



tag_arm ::s tag name ,...[( idn: type_spec ) J : body 
The expression must evaluate to a oneof or variant object. The tag of this object is then matched 
against the names on the tag_arms. When a match is found, if a declaration Udn: type_spec) exists, 
the value component of the object is assigned to the local variable idn. The matching body is then 
executed; idn is defined only in that body. If no match is found, the body in the others arm is 
executed. 

In a syntactically correct tagcase statement, the following constraints are satisfied. The type 
of the expression must be some oneof or variant type, T. The tags named in the tag_arms must be 
a subset of the tags of T, and no tag may occur more than once. If all tags of T are present, there 
is no others arm; otherwise an others arm must be present. Finally, on any tag_arm containing a 
declaration (idn: type_spec), type_spec must equal the type specified as corresponding in T to the 
tag or tags named in the tag_arm. 

An example of a tagcase statement is 

pair - structfcar: int, cdr: intjist] 
x: oneoftpair: pair, empty: null] 

while true do 
tagcase x 

tag empty: return<f alse) 
tag pair (p: pair): if p.car = i 

then return(true) 
else x :- dowrKp.cdr) 
end 
end 
end 

This statement might be used in a list (of integers) operation that determines whether some given 

integer (i) is on the list. 
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11.7 Return Statement 

The form of the return statement is: 

return [ ( expression ,...)] 
The return statement terminates execution of the containing procedure or iterator. If the return 
statement is in a procedure, the type of each expression must be included in the corresponding 
return type of the procedure. The expressions (if any) are evaluated from left to right, and the 
objects obtained become the results of the procedure. If the return statement occurs In an iterator 
no results can be returned. 

For example, inside a procedure p with type 

proctype (...) returns (int, char) 
the statement 

returnO, 'a") 
is legal and returns the two result objects 3 and V. 

11.8 Yield Statement 

Yield statements may occur only in the body of an iterator. The form of a yield statement is: 
yield [ ( expression ,...)] 
It has the effect of suspending operation of the iterator, and returning control to the invoking for 
statement. The values obtained by evaluating the expressions (left to right) are passed to the for 
statement to be assigned to the corresponding list of identifiers. The type of each expression must 
be included in the corresponding yield type of the iterator. 



12. Exception Handling and Exits 

A routine is designed to perform a certain task. However, in some cases that task may be 
impossible to perform. In such a case, instead of returning normally (which would imply successful 
performance of the intended task), the routine should notify its caller by signalling an exception, 
consisting of a descriptive name and zero or more result objects. 
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For example, the procedure stringSfetch takes a string and an integer index and returns the 
character of the string with the given index. However, if the integer is not a legal index into the 
string, the exception bounds is signalled instead. The type specification of a routine contains a 
description of the exceptions it may signal; for example, stringSfetch is of type 
proctype (string, int) returns (char) signals (bounds) 

The exception handling mechanism consists of two parts, the signalling of exceptions and the 
handling of exceptions. Signalling is the way a routine notifies its caller of an exceptional 
condition; handling is the way the caller responds to such notification. A signalled exception 
always goes to the immediate caller, and the exception must be handled in that caller. When a 
routine signals an exception, the current activation of that routine terminates and the 
corresponding invocation (in the caller) is said to raise the exception. When an invocation raises 
an exception, control immediately transfers to the closest applicable handler. Handlers are attached 
to statements; when execution of the handler completes, control passes to the statement following 
the one to which the handler is attached. 

The exception failure serves as a general catch-all error indication. When raised, it implies 
that some lower-level abstraction has failed in an unexpected (and possibly catastrophic) way. 
Failure is accompanied by a string result explaining the reason for the failure. All routines can 
potentially signal failure. Failure is implicitly part of all routine headings and routine types; a 
signals clause must not list failure explicitly. 

12.1 Signal Statement 

An exception is signalled with a signal statement, which has the form: 
signal name [ ( expression ,...)] 
A signal statement may appear anywhere in the body of a routine. The execution of a signal 
statement begins with evaluation of the expressions (if any), from left to right, to produce a list of 
exception results. The activation of the routine is then terminated. Execution continues in the 
caller as described in Section 12.2 below. 

The exception name must be either one of the exception names listed in the routine heading, 
or failure. If the corresponding exception specification in the heading has the form 

name(Tj, .... T n > 
then there must be exactly n expressions in the signal statement, and the type of the ith expression 
must be included in Tj. If the name is failure, then there must be exactly one expression present. 
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of type string. 

The following useless procedure contains a number of examples of signal statements: 

signaller - proc (i: int) returns (int) signals (zero, negativeOnf)) 
if i < then signal negative(-i) 

elseif i > then returnO) 

elseif i « then signal zero 

else signal faitureCunreachable statement executed!") 

end 
end signaller 

12.2 Except Statement 

When a routine activation terminates by signalling an exception, the corresponding invocation 
(the text of the call) is said to raise that exception. By attaching handlers to statements, the caller 
can specify the action to be taken when an exception is raised. 

A statement with handlers attached is called an except statement, and has the form: 
statement except { whenjiandler } 

[ others_handler J 



end 



where 



when_handler ::= when name ,...[( decl ,...)]: body 
| when name ,...(*): body 

othersjiandler ::s others [ ( idn : type_spec ) J : body 
Let S be the statement to which the handlers are attached, and let X be the entire except 
statement. Each whenjiandler specifies one or more exception names and a body. The body is 
executed if an exception with one of those names is raised by an invocation in S. All of the names 
listed in the whenjiandlers must be distinct. The optional othersjiandler is used to handle all 
exceptions not explicitly named in the whenjiandlers. The statement S can be any form of 
statement, and can even be another except statement. 

If, during the execution of 5, some invocation in 5 raises an exception £, control immediately 
transfers to the closest applicable handler; i.e., the closest handler for E that is attached to a 
statement containing the invocation. When execution of the handler completes, control passes to 
the statement following the one to which the handler is attached. Thus if the closest handler is 
attached to 5, the statement following X is executed next. If execution of S completes without 



$12.2 Except Statement 65 

raising an exception, the attached handlers are not executed. 

An exception raised inside a handler is treated the same as any other exception: control passes 
to the closest handler for that exception. Note that an exception raised in some handler attached to 
S cannot be handled by any handler attached to 5; either the exception is handled within the 
handler, or it is handled by some handler attached to a statement containing X. 
We now consider the forms of handlers in more detail. The form 

when name ....[< dec! ,...)]: body 
is used to handle exceptions with the given names when the exception results are of interest. The 
optional declared variables, which are local to the handler, are assigned the exception results before 
the body is executed. Every exception potentially handled by this form must have the same 
number of results as there are declared variables, and the types of the results must equal the types 
of the variables. The form 

when name ....(*>: body 
handles all exceptions with the given names, regardless of whether or not there are exception 
results; any actual results are discarded. Hence exceptions with differing numbers and types of 
results can be handled together. 
The form 

others [ < idn : type_spec ) J : body 

is optional, and must appear last in a handler list. This form handles any exception not handled 

by other handlers in the list. If a variable is declared, it must be of type string. The variable. 

which is local to the handler, is assigned a lower case string representing the actual exception name; 

any results are discarded. 

Note that exception results are ignored when matching exceptions to handlers; only the names 

of exceptions are used. Thus the following is illegal, in that Inttdiv signals zero_divide without 

any results, but the closest handler has a declared variable: 

begin 
y: int :- 
x: int :- 3 / y 

except when zero.divide (z: Int): return end 
end 

except when zero_divide: return end 

An invocation need not be surrounded by except statements that handle all potential 
exceptions. This policy was adopted because in many cases the programmer can prove that a 
particular exception will not arise. For example, the invocation lnt$div(x, 7) will never signal 
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zero_divide. However, this policy does lead to the possibility that some invocation may raise an 

exception for which there is no handler. To avoid this situation, every routine body is contained 

implicitly in an except statement of the form 

begin routine_body end 

except when failure (s: string): signal failure<s) 

others <s: string): signal failureCunhandled exception: " H s) 

end 

Failure exceptions are propagated unchanged; an exception named nam becomes 

failureCunhandled exception: name") 

12.3 Resignal Statement 

A resignal statement is a syntactically abbreviated form of exception handling: 

statement resignal name , ... 
Each name listed must be distinct, and each must be either one of the condition names listed in the 
routine heading, or failure. The resignal statement acts like an except statement containing a 
handler for each condition named, where each handler simply signals that exception with exactly 
the same results. Thus, if the resignal clause names an exception specification in the routine 
heading of the form 

name<Tj, .... T n ) 
then effectively there is a handler of the form 

when name (xj: Tj x n : T n ): signal namefxj, .... x n > 

As for an explicit handler of this form, every exception potentially handled by this implicit handler 
must have the same number of results as declared in the exception specification, and the types of 
the results must equal the types listed in the exception specification. 
As a simple example, if a routine has a signals clause of the form 

signals (underf tow, overflow) 

then 

x: real :- 3.14.159 » y * y 

resignal underf tow, overflow 

is equivalent to 
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x: real :- 3.14159 » y * y 

except when underflow: signal underflow 
when overflow: signal overflow 
end 

12.4 Exit Statement 

A local transfer of control can be effected by using an exit statement, which has the form: 
exit name [ ( expression ,...)] 
An exit statement is similar to a signal statement except that where the signal statement signals 
an exception to the calling routine, the exit statement raises the exception directly in the current 
routine. An exception raised by an exit statement must be handled (explicitly) by a containing 
except statement with a handler of the form 

when name ,...[( decl ,...)]: body 
As usual, the types of the expressions in the exit statement must equal the types of the variables 
declared in the handler. The handler must be an explicit one, i.e., exits to the implicit handlers of 
resignal statements or to the implicit failure handler enclosing a routine body are illegal. 

The exit statement and the signal statement mesh nicely to form a uniform mechanism. The 
signal statement can be viewed simply as terminating a routine activation; an exit is then 
performed at the point of invocation in the caller. (Because this exit is implicit, it is not subject to 
the restrictions on exits listed above.) 

The following is a simple example of the use of exits in search loops: 

elt: T 
begin 
for elt In arrayf.T]$elements(x) do 
if special(elt) then exit found end 
end 
elt := make_new_one(...) % Didn't find one, so make one up 
end except when found: end 
% At this point we have an object and we don't care how we got it 

12.5 Example 

We now present an example demonstrating the use of exception handlers. We will write a 
procedure, sum_stream, which reads a sequence of signed decimal integers from a character stream 
and returns the sum of those integers. The stream is viewed as containing a sequence of fields 
separated by spaces; each field must consist of a non-empty sequence of digits, optionally preceded 
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by a single minus sign. Sum_stream has the form 

sum_stream - proc <s: stream) returns (int) signals (overflow, 

unrepresentable JntegeH string). 
badjformat(strlng)) 

end sum_stream 
Suin_stream signals overflow if the sum of the numbers or an intermediate sum is outside the 
implemented range of integers. Unrepresentablejnteger is signalled if the stream contains an 
individual number that is outside the implemented range of integers. Badjormat is signalled if 
the stream contains a field that is not an integer. 

We will use the getc operation of the stream data type (see Appendix HI), whose type is 
proctype (stream) returns (char) signals (end_of_f tie, not_possible( string)) 
This operation returns the next character from the stream, unless the stream is empty, in which 
case end_of _f ile is signalled. Not_possible is signalled if the operation cannot be performed on the 
given stream (e.g., it is an output stream, or does not allow character operations, etc.) We will 
assume that we are given a stream for which getc is always possible. 

The following procedure is used to convert character strings to integers: 

s2i - proc (s: string) returns (int) signals (invalid _character(char), 

badjormat, 
unrepresentablejnteger) 

end s2i 
S2i signals in valid .character if its string argument contains a character other than a digit or a 
minus sign. Badjormat is signalled if the string contains a minus sign following a digit, more 
than one minus sign, or no digits. Unrepresentablejnteger is signalled if the string represents an 
integer that is outside the implemented range of integers. 

An implementation of sum_stream is presented in Figure 5. There are two loops within an 
infinite loop: one to skip spaces, and one to accumulate digits for conversion to a number. Notice 
the placement of the inner end.of Jile handler. If endjof Jile is raised in the second inner loop, 
then the sum is computed correctly, and the first invocation of streamtgetc will again raise 
end_of Jile. This time, however, the infinite loop is terminated and execution transfers to the 
other end_of Jile handler, which then returns the accumulated sum. 

We have placed the remaining exception handlers outside of the infinite loop to avoid 
cluttering up the main part of the algorithm. Each of these exception handlers could also have 
been placed after the particular statement containing the invocation that signalled the 
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Fig. 5. The sum_stream procedure. 

sum_stream - proc (s: stream) returns (int) signals (overflow, 

unrepresentablejntegertstrlng), 
bad _format( string)) 
sum: Int := 
num: string 
while true do 

X skip over spaces between values; sum is valid, num is meaningless 
c: char := streamSgetc(s) 
while c • " do 
c •- stream$getc(s) 
end 
X read a value; num accumulates new number, sum becomes previous sum 
num :» "" 
while c ~- " do 

num :- string$append(num, c) 
c := stream$getc(s> 
end 

except when end_of_f ile: end 
X restore sum to validity 
sum :« sum + s2i(num) 
end 
except when end_of _f ile: return(sum) 

when unrepresentablejnteger: signal unrepresentable JntegeHnum) 
when badjformat, invalidjcharacter (♦): signal bad Jormat(num) 
when overflow: signal overflow 
end 
end surnstream 



corresponding exception. The (*) form is used in the handler for the bad_format and 
invalid_character exceptions since the exception results are not used. Note that the overflow 
handler catches exceptions signalled by the inttadd procedure, which is invoked using the infix + 
notation. Note also that in this example all of the exceptions raised by surn_stream originate as 
exceptions signalled by lower-level modules. Sum_stream simply reflects these exceptions upwards 
in terms that are meaningful to its callers. Although some of the names may be unchanged, the 
meanings of the exceptions (and even the number of results) are different in the two levels. 

As mentioned above, we have assumed streamSgetc never signals not_possible; if it does, then 
ium_stream will terminate, raising the exception 

failureCunhandled exception: not4x>ssible") 
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A CLU program consists of a group of modules. Three kinds of modules are provided, one 
for each kind of abstraction we have found to be useful in program construction: 
module ::s { equate } procedure 
| { equate } iterator 

| { equate } cluster 
Procedures support procedural abstraction, iterators support control abstraction, and clusters 
support data abstraction. 

A module defines a new scope. The identifiers introduced in the equates (if any) and the 
identifier naming the abstraction (the module name) are local to that scope (and therefore may not 
be redefined in an inner scope). Abstractions implemented by other modules are referred to by 
using non-local identifiers. The system will provide some means of determining what abstractions 
are meant by these non-local identifiers; one such mechanism is defined in Section 4. 

The existence of an externally established meaning for an identifier does not preclude a local 
definition for that identifier. Within a module, any identifier may be used in a purely local 
fashion or in a purely non-local fashion, but no identifier may be used in both ways. 

Example programs appear in Appendix IV. 

13.1 Procedures 

A procedure performs an action on zero or more arguments, and terminates returning zero or 
more results. A procedure supports a procedural abstraction: a mapping from a set of input 
objects to a set of result objects, with possible modification of some of the input objects. A 
procedure may terminate in one of a number of conditions; one of these is the normal condition, 
while others are exceptional conditions. Differing numbers and types of results may be returned in 
the different conditions. 

The form of a procedure is 

idn - proc [ parms ] args [ returns ] [ signals J [ where J 

routine_body 
end idn 

where 
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args ::* < [ decl ,...]) 

returns ::= returns ( type_spec , ... ) 

signals ::= signals ( exception , ... ) 

exception ::s name [ ( type_spec ,...>] 

routine.body ::s { equate } 

{ own_var } 

{ statement } 

In this section we discuss non-parameterized procedures. For a non-parameterized procedure, 
the parms and where clauses are missing. Parameterized modules are discussed in Section 13.4. 
Own variables are discussed in Section 13.5. 

The heading of a procedure describes the way in which the procedure communicates with its 
caller. The args clause specifies the number, order, and types of arguments required to invoke the 
procedure, while the returns clause specifies the number, order, and types of results returned when 
the procedure terminates normally (by executing a return statement or reaching the end of its 
body). A missing returns clause indicates that no results are returned. 

The signals clause names the exceptional conditions in which the procedure can terminate, 
and specifies the number, order, and types of result objects returned in each condition. In addition 
to the. conditions explicitly named in the signals clause, any procedure can terminate in the /allure 
condition. The failure condition returns with one result, a string object. All names of exceptions 
in the signals clause must be distinct, and none can be failure. 

A procedure is .an object of some procedure type. For a non-parameterized procedure, this 
type is derived from the procedure heading by removing the procedure name, rewriting the formal 
argument declarations with one Idn per decl, deleting the names of formal arguments, and finally, 
replacing proc by proctype. 

As was discussed in Section 9.3, the invocation of a procedure causes the introduction of the 
formal variables, and the actual arguments are assigned to these variables. Then the procedure 
body is executed. Execution terminates when a return statement or a signal statement is executed, 
or when the textual end of the body is reached. If a procedure that should return results reaches 
the textual end of the body, the procedure terminates in the condition 

failure("no return values") 
At termination the result objects, if any, are passed back to the invoker of the procedure. 
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The idn following the end of the procedure must be the same as the idn naming the 
procedure. 

Examples of procedures are given in Appendix IV. 

13.2 Iterators 

An iterator computes a sequence of items, one item at a time, where an item is a group of zero 
or more objects. In the generation of such a sequence, the computation of each item of the 
sequence is usually controlled by information about what previous items have been produced. Such 
information and the way it controls the production of items is local to the iterator. The user of the 
iterator is not concerned with how the items are produced, but simply uses them (through the for 
statement) as they are produced. Thus the iterator abstracts from the details of how the production 
of the items is controlled; for this reason, we consider an iterator to implement a control abstraction. 
Iterators are particularly useful as operations of data abstractions that are collections of objects 
(e.g., sets), since they may produce the objects in a collection without revealing how the collection is 
represented. 

An iterator has the form 

idn - iter [ parms ] args [ yields ] [ signals ] [ where ] 
routine.body 



end idn 



where 



yields ::a yields ( type_jpec , ... ) 
In this section we discuss non-parameterized iterators, in which the parms and where clauses are 
missing. Parameterized modules are discussed in Section 13.4. Own variables are discussed in 
Section 13.5. 

The form of an iterator is very similar to the form of a procedure. There are only two 
differences: 

1. An iterator has a yields clause in its heading in place of the returns 
clause of a procedure. The yields clause specifies the number, order, 
and types of objects yielded each time the iterator produces the next 
item in the sequence. If zero objects are yielded, then the yields clause 
is omitted. 

2. Within the iterator body, the yield statement is used to present the next 
item in the sequence. An iterator terminates in the same manner as a 
procedure (note that it may not return any results). 
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An iterator is an object of some iterator type. Its type can be derived from its heading by 
removing the iterator name, rewriting the formal argument declarations with one Idn per decl, 
deleting the formal argument names, and finally, replacing Iter by Itertype. 

An iterator can be invoked only by a for statement. The execution of Iterators Is described In 
Section 11.5.2. 

An example of an iterator is 

splits - iter (s: string) yields (string, string) 

for i: int In inttf rom_to(0, string$size<s)) do 

yield<string$substr(s, 1, i>, string$rest(s, i ♦ 1M 

end 
end splits 

Additional examples of iterators are given in the next section. 

Remarks 

Iterators provide a useful mechanism for abstracting from the details of control. Furthermore. 
they permit for statements to iterate over the objects of interest, rather than requiring a mapping 
from the integers to those objects. 

It is important to realize that the argument objects passed to the iterator are also accessible in 
the body of the for loop controlled by the iterator. If some argument object is mutable, and the 
iterator modifies it, the change can affect the behavior of the for loop body, and vice-versa. Such 
changes can be the cause of program errors. 

As a general principle, an iterator should not modify its argument objects. There are some 
examples, however, where modification is appropriate. For example, an iterator that produces the 
characters from an input stream would advance the stream "window" (the currently accessible 
character) on each iteration. 

Also as a general principle, the for loop body should not modify the iterator's argument 
objects. Again, occasional examples exist where modification is desirable. In programming such 
examples, the programmer must ensure that the iterator will still behave correctly in spite of the 
modifications. 
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13.3 Clusters 

A cluster is used to implement a new data type, distinct from any other built-in or user-defined 
data type. A data type (or data abstraction) consists of a set of objects and a set of primitive 
operations. The primitive operations provide the most basic ways of manipulating the objects; 
ultimately every action that can be performed on the objects must be expressed in terms of the 
primitive operations. Thus the primitive operations define the lowest level of observable object 
behavior. 

The form of a cluster is 

idn = cluster [ parms J is idn , ... [where J 
clusterjbody 



end idn 



where 



cluster.body ::s { equate } rep - type_spec { equate } 
{ own_var y 
routine { routine } 

routine ::s procedure 

| iterator 

In this section we discuss non-parameterized clusters, in which the parms and where clauses are 

missing. Parameterized modules are discussed in Section 13.4. Own variables are discussed In 

Section 13.5. 

The primitive operations are named by the list of idns following the reserved word Is. All of 
the idns in this list must be distinct. 

To define a new data type, it is necessary to choose a concrete representation for the objects of 
the type. The special equate 

rep - type_spec 
within the cluster body identifies type^spec as the concrete representation. Within the cluster, rep 
may be used as an abbreviation for type^spec. 

The identifier naming the cluster is available for use in the cluster body. Use of this 
identifier within the cluster body permits the definition of recursive types (an example is given 
below). 
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In addition to specifying the representation of objects, the cluster must implement the primitive 
operations of the type. The operations may be either procedural or control abstractions; they are 
implemented by procedures and iterators, respectively. Most of the routines in the cluster body 
define the primitive operations (those whose names are listed in the cluster heading). Any 
additional routines are hidden: they are private to the cluster and may not be invoked by users of 
the abstract type. All the routines must be named by distinct identifiers; the scope of these 
identifiers is the entire cluster. 

Outside the cluster, the type's objects may only be treated abstractly (i.e.. manipulated by using 
the primitive operations). To implement the operations, however, it is usually necessary to 
manipulate the objects in terms of their concrete representation. It is also convenient sometimes to 
manipulate the objects abstractly. Therefore, inside the cluster it is possible to view the type's 
objects either abstractly or in terms of their representation. The syntax is defined to specify 
unambiguously, for each variable that refers to one of the type's objects, which view is being taken. 
Thus, inside a cluster named T, a declaration 

v:T 
indicates that the object referred to by » is to be treated abstractly, while a declaration 

w: rep 
indicates that the object referred to by w is to be treated concretely. Two primitives, up and down, 
are available for converting between these two points of view. The use of up permits a type rep 
object to be viewed abstractly, while down permits an abstract object to be viewed concretely. For 
example, given the declarations above, the following two assignments are legal: 

v :- up(w) 
w := down(v) 

Only routines inside a cluster may use up and down. Note that up and down are used merely to 

inform the compiler that the object is going to be viewed abstractly or concretely, respectively. 

A common place where the view of an object changes is at the interface to one of the type's 

operations: the user, of course, views the object abstractly, while inside the operation, the object is 

viewed concretely. To facilitate this usage, a special type specification, cvt, is provided. The use 

of cvt is restricted to the args, returns, yields and signals clauses of routines inside a cluster, and 

may be used at the top level only (e.g., arraytcvt] is illegal). When used inside the args clause, it 

means that the view of the argument object changes from abstract to concrete when it is assigned 

to the formal argument variable. When cvt is used in the returns, yields, or signals clause, it 
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means the view of the result object changes from concrete to abstract as it is returned (or yielded) 
to the caller. Thus cvt means abstract outside, concrete inside: when constructing the type of a 
routine, cvt is equivalent to the abstract type, but when type-checking the body of a routine, cvt is 
equivalent to the representation type. 

The cvt form does not introduce any new ability over what is provided by up and down. It 
is merely a shorthand for a common case. In its absence, the heading of each routine would have 
to be written using the abstract type in place of cvt Then inside the routine, additional variables 
of type rep would be declared, the argument objects assigned to these variables using down, and 
each return, yield, or signal statement would use up explicitly. The use of cvt simply causes the 
appropriate up or down to be performed automatically, and avoids the declaration of additional 
variables. 

The type of each routine is derived from its heading in the usual manner, except that each 
occurrence of cvt is replaced by the abstract type. 

Inside the cluster, it is not necessary to use the compound form UypespecXopjname) for 
naming locally defined routines. Furthermore, the compound form cannot be used for invoking 
hidden routines. 

The identifier following the end must match the identifier naming the cluster. 

Some examples of clusters are shown in Figure 6. The first example implements (part of) a 
complex number data type. This data type may be implemented using either x and y coordinates, 
or rho and theta coordinates; the cluster shown uses x and y coordinates. Note that the create, 
get_x, and get_y operations might signal an exception if rho/theta coordinates were used; therefore 
these exceptions are listed in the headings, even though in this implementation the exceptions will 
not be signalled. The coordinates of a complex number can be queried using the get operations 
explicitly, or by using the special syntax, e.g., 

a.theta 
No set operations are provided, since complex numbers should be immutable like other numbers 
(integers, reals, etc.). Other operations on complex numbers are the usual arithmetic ones (only add 
is shown), and equal, similar, and copy (these are discussed in the remarks section below). (Note: we 
have assumed that square_root and arctangent 2 exist in the library.) 

The second example cluster implements lists of integers. These lists are immutable, like pure 
lists in LISP. The implementation is recursive: the representation type refers to the abstract type. 
Notice the elements operation, which produces all integers in the list in order; it is an example of a 
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recursive iterator. 

The final example is sets of integers. The sets are mutable: operations insert and delete 
modify sets. Again note the elements iterator, which produces all elements of a set in some 
unspecified order. Also note the use of isjn in insert; since is_in requires an abstract object as its 
argument, up is used to provide one. 

Remarks 

The main reason CLU was developed was to support the use of data abstractions. Use of data 
abstractions leads to an object-oriented style of programming, in which concerns about data are 
primary and serve to organize program structure. It requires some effort to learn to program in 
this style, but the effort is worthwhile because the resulting programs are more modular, and easier 
to modify and maintain. 

A cluster permits all knowledge about how a data abstraction is being implemented to be kept 
local to the cluster. This localization permits the correctness of an implementation to be established 
by examining the cluster alone. Part of such a correctness proof involves showing that only legal 
representations are generated by the cluster. For example, in the int_set cluster above, not all 
arrays are legal intset representations; only those without duplicate elements are legal. 
Information about what constitutes a legal representation is described during program verification 
by stating the concrete invariant. Each operation must preserve this invariant for each object that 
it manipulates of the abstract type. This requirement applies at all return and signal statements 
in operations, and also at yield statements in iterator operations. 

When defining a new data type, it is important to provide a set of primitive operations 
sufficient to permit all interesting manipulations of the objects. There is no reason to attempt to 
define a minimal set, however; frequently used operations can be made operations of the cluster 
even if they could be implemented in terms of other operations. 

Operations that will frequently be required are copy, equal, and similar. These operations are 
needed if the type being defined is intended for general use, since without these operations, the use 
of the type within another type's concrete representation is somewhat limited. For example, 
arrayfT]$copy cannot be used unless T has a copy operation. In addition, most types should 
provide I/O operations as discussed in Appendix III. 
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Fig. 6. Example Clusters 

complex - cluster Is create, add, getjc, get_y, get_rho. getjheta, equal, similar, copy 

rep * structf x, y: reeO 

create - proc (x, y: reel) returns (cvt) signals (overflow, underflow) 
return(rep${x: x, y: y)) 
end create 

add - proc (a, b: cvt) returns (cvt) signals (overflow, underflow) 
return(rep${x: a.x ♦ b.x, y: a.y ♦ b.yl) 

resignal overflow, underflow 
end add 

getjc - proc (c: cvt) returns (real) signals (overflow, underflow) 
return(c.x) 
end getjc 

get.y - proc (c: cvt) returns (real) signals (overflow, underflow) 
return(c.y) 
and get_y 

get_rho - proc (c: cvt) returns (real) signals (overflow, underflow) 
return* square jootfc.x * ex ♦ c.y * c.y)) 

resignal overflow, underflow 
end get _rho 

getjheta - proc (c: cvt) returns (real) signals (overflow, underflow) 
return(arctangent2(c.x, c.y)) 

resignal overflow, underflow 
end getjheta 

% Note that the equal operation of the rep type tests equality of corresponding real c ompo n ents, 
X not identity of rep objects. 

equal - proc (cl, c2: cvt) returns (booD 
return(cl - c2) 
end equal 

similar - proc (cl. c2: cvt) returns (booD 
return(cl - c2) 
end similar 

copy - proc (c: complex) returns (complex) 
returrrfc) 
end copy 

•nd complex 
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intjist = cluster is create, cons, car, cdr, is_in, isjempty, elements, equal, similar, copy 

rep - oneoftpair: pair, empty: null] 
pair ■ structEcar: int, cdr: intjist] 

create » proc returns (cvt) 

return* rep$make_empty( nil)) 
end create 

cons - proc (i: Int 1st: intjist) returns (cvt) 

return<rep$make_pair(pair${car: i, cdr: 1st))) 
end cons 

car » proc (1st: cvt) returns (int) signals (empty) 
tagcase 1st 

tag pair (p: pair): returrKp.car) 
tag empty: signal empty 
end 
end car 

cdr - proc (1st: cvt) returns (intjist) signals (empty) 
tagcase 1st 

tag pair (p: pair): return(p.cdr) 

tag empty: signal empty 

end 
end cdr 

isjn - proc (1st: cvt, i: int) returns (bool) 
while true do 
tagcase 1st 

tag empty: return( false) 
tag pair (p: pair): if p.car - i 

then returndrue) 
else 1st :« down(p.cdr) 
end 
end 
end 
end isjn 

is.empty - proc (1st: cvt) returns (bool) 
return* rep$is_empty(lst)) 
end is.empty 
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elements - Iter (1st: cvt) yields (hit) 
tagcase 1st 

tag pair (p: pair): yleldfpxar) 

for i: int in elements<pxdr) do 
yleldti) 
and 
tag empty: 
end 
end elements 

% Note that the equal operation of the rep type tests equality of corresponding Hst elements, not 
X identity of rep objects. 

equal - procdstl, Ist2: cvt) returns (booD 
returndstl ■ Ist2) 
end equal 

similar - proc (Istl. Ist2: cvt) returns (booD 
returndstl - bt2) 
end similar 

copy - proc (1st: int Jist) returns (intjist) 
return* 1st) 
end copy 

and tnt Jist 



Intjtet - cluster Is create, insert, delete, isjn, size, elements, equal, similar, copy 
rap - arraytint] 



create - proc returns (cvt) 
reform* reptnewO) 
end create 

insert - proc (s: cvt, i: bit) 

If - isjn(up(s). i) then rep$addh(s. i) end 
end insert 



513.3 Clusters 19 



delete - proc <s: cvt i: int) 

for j: int in repjindexes(s) do 
if i - sCjl 
then s[j] -.- rep$top(s) 
rep$remh(s) 
return 
end 
end 
■ end delete 

isjn - proc (s: cvt »: int) returns (bool) 
for j: int in rep$elements(s) do 

if i - j then return(true> end 

end 
return< false) 
end isjn 

size - proc (s: cvt) returns (int) 
return(rep$size<$)) 
end size 

elements • iter (s: cvt) yields (int) 

for i: int in rep$elements(s) do 

yielcKi) 

end 
end elements 

equal « proc (si, s2: cvt) returns (bool) 
return<sl - s2) 
end equal 

similar = proc (si. s2: int_set) returns (bool) 

if size(sl) ~« size(s2) then return(false) end 
for i: int in elements(sl) do 

if ~ isjn(s2, i) then return(false) end 

end 
return(true) 
end similar 

copy - proc (s: cvt) returns (cvt) 
return( rep$copy(s)) 
end copy 

end int_set 
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In many earlier sections, we have discussed the use of special syntactic forms for invoking 
operations, and have described how operations must be named and defined in order to make use of 
these syntactic forms. The use of such forms is quite unconstrained: the special form is translated 
to an invocation, and is legal if the invocation is legal. 

Our reason for not imposing more syntactic constraints on operator overloading is that such 
constraints only capture a small part of what it means to use operator overloading correctly. For 
example, to overload "-" correctly, the equal operation should be an equivalence relation satisfying 
the substitution property; i.e., if two objects are equal, then one can be substituted for the other 
without any detectable difference in behavior. In the sections where special syntactic forms are 
described, we have discussed in each case what constitutes proper usage. 

Overloading operator symbols is not the only place where care must be taken to ensure that the 
new definition agrees with common usage; the same care must be taken when redefining common 
operation names. For example, the copy operation should provide a "copy" of its input object, such 
that subsequent changes made to either the old or the new object do not affect the other. In the 
case of an immutable type, like complex_number above, in which sharing between two objects will 
never be visible to the using program, copy can simply return its input object. Ordinarily, however, 
copy should copy its input objects, including each component (using the copy operation of the 
component's type), as is done in the implementation of intset. 

The equal operation should return true if its two input objects are the same abstract object. 
This is necessary to satisfy the substitution property: if two objects are equal, then using one in 
place of the other in a computation will not alter the computation. Thus, implementing equal 
properly requires a thorough understanding of both the abstraction being implemented and the 
representation being used. Usually two mutable objects are equal only if they are the exact same 
object in the CLU universe; e.g., see int_set$equal above. For immutable objects, the contents of 
the object is usually all that matters; e.g., see complexiequal and intjisttequal above. 

The similar operation should return true only if its two input objects (both of the same type) 
have "equivalent state". This means that any query made about information in two simitar objects 
immediately after they were determined to be similar would provide an equivalent answer for 
either of the two objects (i.e., the answers would be similar). Note that similar is a weaker 
condition than equal: two objects are equal if they are the same abstract objects, and so of course 
they are similar for all time. Equal and similar return different results only for mutable types, 
because only mutable types have objects whose state can change. Copy and similar should be 
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related as follows for any type T: 

V x < T [ T$similar(x, T$copy(x)) ] 
With the exception of set and store operations, procedures that define operator symbols, copy, 
similar, and the I/O operations should never modify their input objects in a way that the user of 
the object can detect. This rule does not prohibit "benevolent" side-effects, i.e., modifications that 
speed up future operations without affecting behavior in any other way. 

13.4 Parameterized Modules 

Procedures, iterators, and clusters may all be parameterized. Parameterization permits a set of 
related abstractions to be defined by a single module. Recall that in each module heading there is 
an optional parrns clause and an optional where clause. The presence of the parms clause 
indicates that the module is parameterized; the where clause states certain constraints on 
permissible actual vajues for the parameters. 

The form of the parms clause is 
[ parm , ... ] 
where 

parm ::s idn , ... : type.spec 
| idn , ... : type 
Each parameter is declared like an argument. However, only the following types of parameters are 
legal: Int real, bool, char, string, null, and type. Parameters are limited to these types because 
the actual values for parameters are required to be constants that can be computed at compile-time. 
This requirement ensures that all types are known at compile-time, and permits complete 
compile-time type-checking. 

In a parameterized module, the scope rules permit the parameters to be used throughout the 
remainder of the module. Thus they can be used in defining the types of arguments and results, 

eg- 

p « proc [t: type] (x: t) returns (t) 
To use a parameterized module, it is first necessary to instantiate it; that is, to provide actual, 
constant values for the parameters. (The exact forms of such constants were discussed in 
Section 8.3.) The result of instantiation is a procedure, iterator, or type (where the parameterized 
module was a procedure, iterator, or cluster, respectively) that may be used just like a 
non-parameterized module of the same kind. For each distinct instantiation, (i.e., for each distinct 
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list of actual parameters), a distinct procedure, iterator, or type is produced. 

The meaning of a parameterized module is most easily understood in terms of rewriting. 
When the module is instantiated, the actual parameter values are substituted for the formal 
parameters throughout the module, and the parms clause and where clause are deleted. The 
resulting module is a regular (non-parameterized) module. In the case of a cluster some of the 
operations may have additional parameters; further rewriting will be performed when these 
operations are used. 

In the case of a type parameter, constraints on permissible actual types can be given in the 
where clause. The where clause lists a set of operations that the actual type is required to have, 
and also specifies the type of each required operation. The where clause constrains the 
parameterized module as well: the only primitive operations of the type parameter that can be used 
are those listed in the where clause. 

The form of the where clause is 

where ::s where restriction , ... 

where 

restriction ::s idn has operjdecl , ... 
| idn In type_set 

* oper_decl ::s op_name , ... : type_spec 

op_name ::= name [ [ constant ,...]] 

type_set ::= ( idn I idn has operjdecl , ... { equate } ) 
| idn 
There are two forms of restrictions. In both forms, the initial idn must be a type parameter. 
The has form lists the set of required operations directly, by means of optr_decls. The typespec 
in each operjdecl must name a routine type. Note that if some of the type's operations are 
parameterized, particular instantiations of those operations must be given. The In form requires 
that the actual type be a member of a typeset, a set of types having the required operations. The 
two identifiers in the type_set must match, and the notation is read like set notation; e.g., 

it It has f: ... } 
means "the set of all types t such that t has/...". The scope of the identifier is the type_set. 

The in form is useful because an abbreviation can be given for a typejet via an equate. If it 
is helpful to introduce some abbreviations in defining the type_set, these are given in the optional 
equates within the typejet. The scope of these equates is the type_set. 
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A routine in a parameterized cluster may have a where clause in its heading, and can place 
further constraints on the cluster parameters. For example, any type is permissible for the array 
element type, but the array similar operation requires that the element type have a similar 
operation. This means that arraytTl exists for any type T, but that arraylT]$similar exists only 
when TSsimilar exists. Note that a routine need not include in its where clause any of the 
restrictions included in the cluster where clause. 

Two examples of parameterized clusters are shown in Figure 7. The first defines the set type 
generator. This cluster is similar to int^set, presented in the previous section. The main difference 
is that everywhere that integer elements were assumed, now the parameter t is used.- The set type 
generator has a where clause that requires the element type to provide an equal operation; in 
addition, the similar operation imposes an additional constraint on the element type by requiring a 
similar operation. Thus settX] is legal if X has an equal operation; but settXHsimilar can be used 
only if X also has a similar operation. Note the procedure iS-in_sim; it is a hidden routine of this 
implementation. Also note the use of the type.set simjtype. 

The state of a set object is the set of abstract objects currently in the set. What matters is the 
identity of the objects, not their state. This should help in understanding why equal, similar, and 
copy are written as they are. Notice that we have two new operations, similar 1 and copy!. Similar! 
returns true when two objects have equal state (in the abstract sense), whereas similar returns true 
when they have similar state. Copyl is to similar 1 what copy is to similar, i.e., 
T$similarl(T$copyl(x), x) should always be true. In general, mutable type generators that behave 
like collections should provide similarl and copyl to ensure that types obtained from the generator 
can be used as part of the concrete representation of other types. 

The second example is a list type generator, which is similar to intjlist in the previous section. 
List does not place any constraints in its type parameter. Therefore any element type is permissible 
for lists, including type any. Note that the types generated by the list type generator are 
immutable. The state of a list is considered to be the ordered set of objects in the list, where only 
the identity of the objects matters. Lists are immutable even if the objects in the lists are mutable, 
because the state of a list never changes. 

Confusion can arise unless the designer and implementor of a data type have in mind a clear 
idea of exactly what constitutes the state of the objects of the type they are defining; it must be 
resolved in which cases it is only the identity of the components that matters, and in which cases 
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Fig. 7. More Example Clusters 

set * cluster It: type] is create, insert, delete, isjn, size, 

elements, equal, similar, similar), copy, copyl 
where t lias equal: proctype (t, t) returns (booD 

rep - err ayCt] 

simjype -{sis has similar: proctype (t, t> returns <boo0) 

create - proc returns (cvt) 
return<rep$new()) 
end create 

insert - proc <s: cvt. v: t> 

if ~ isjn<up(s>, v) then rep$addh<s. v) end 
end insert 

delete - proc (s: cvt, v: t) 

for j: hit in rep$mdexes(s) do 
If v -stp 
then ifi :- repStopfs) 
rep$remh<s> 
return 
end 
end 
end delete 

isjn - proc (s: cvt, v: t> returns (booO 
for u: t In rep$elemenu(s) do 

If u - v then returtrftrue) end 

end 
returnffalse) 
end isjn 

isjn_sim - proc (s: cvt, v: t> returns (booB where t hi simjype 
for u: t in rep$dements<s) do 

if tSsimilaHu, v) then returnttrue) end 
end 
retum( false) 
end is Jn_sim 

size - proc ($: cvt) returns (hit) 
returnt rep$size(s)) 
end size 
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elements - iter (s: cvt) yields (t) 

for v.- 1 in repSelements(s) do 

yielcKv) 

end 
end elements 

equal - proc (si, s2: cvt) returns (bool) 
return(sl - s2> 
end equal 

similar = proc (si, s2: serttl) returns (bool) where t in simjype 
If size(sl) -v- size(s2) then return(false) end 
for u: t in elements(sl) do 

if ~ isjn_sim(s2, u) then returnff alse) end 

end 
return* true) 
end similar 

similarl « proc (si, s2: sertt]) returns (bool) 

if size(sl> ~= size(s2) then returrrf false) end 
for u: t in elements(sl) do 

if ~ is_in(s2, u> then return( false) end 

end 
return(true) 
end similarl 

copy - proc (s: cvt) returns (cvt) where t has copy: proctype (t) returns (t) 
return* rep$copy(s)> 
end copy 

copyl - proc (s: cvt) returns (cvt) 
return* repScopyl(s)) 
end copyl 

end set 
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list - cluster [t: type] is create, cons, car, cdr, isjn, isjempty, elements, equal, similar, copy 

rep » oneoftpair: pair, empty: null] 
pair - structfcar: t, cdr: listft]] 

create - proc returns <cvt) 

return(rep$make_empty(nil)) 
end create 

cons - proc (v: t, 1st: listttl) returns (cvt) 

return(rep$make_pair<pair${car: v, cdr: 1st))) 
end cons 

car » proc (1st: cvt) returns <t) signals (empty) 
tagcase 1st 

tag pair (p: pair): returrKp.car) 
tag empty: signal empty 
end 
end car 

cdr - proc (1st: cvt) returns (listft]) signals (empty) 
tagcase 1st 

tag pair (p: pair): return(p.cdr) 

tag empty: signal empty 

end 
end cdr 

isjn - proc (1st: cvt, v: t) returns (booD where t has equal: proctype (t, t> returns (booO 
while true do 
tagcase 1st 

tag empty: return( false) 
tag pair (p: pair): if p.car - v 

then return(true) 
else 1st :- dowrtfpxdr) 
end 
end 
end 
end isjn 

isjempty - proc (1st: cvt) returns (boot) 
return( rep$is_empty(lst)) 
end isjempty 
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elements - iter (1st: cvt) yields (t) 
tagcase 1st 

tag pair (p: pair): yield(p.car) 

for v: t in elements(pxdr) do 
yieltKv) 
end 
tag empty: 
end 
end elements 

equal - proc (Istl. Ist2: cvt) returns (bool) where t has equal: proctype (t, t) returns (booD 
return* Istl - 1sc2> 
end equal 

similar = proc (Istl, Ist2: cvt) returns (bool) 

where t has similar: proctype (t, t) returns (bool) 
returrrfrepSsimilardstl, Ist2» 
end similar 

copy - proc (1st: cvt) returns (cvt) where t has copy: proctype (t) returns (t) 
returrrf rep$copy(lst)) 
end copy 

end list 
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their state matters as well. 

The position taken in the list type generator below is that the state of a list consists only of the 
identity of the objects in the list, and does not depend on their state. Hence, these lists are 
immutable. This explains why list has no similar 1 or copyl operations, and why equal, similar, and 
copy are implemented as they are. 

There are two restrictions on the kinds of constants that can be used in opjiames of where 
clauses and type_sets. These restrictions eliminate certain ambiguities that would otherwise arise in 
type-checking. There is no need to understand or remember these restrictions, as the programs 
they affect are fairly bizarre, and have never occurred in practice. The rules are included here 
solely for completeness. 

The first restriction is that no type parameter, and no type identifier introduced in a type_set, 
can be used anywhere in an opjiame constant. Thus, if Ms a type parameter, an op_name of the 
form "computet arrayCt]]" would be illegal. The second restriction deals with the way data 
abstractions depend on each other. If, in the interface of a data abstraction A, some data 
abstraction B is used in an op_name constant, we say that A is "restricted in terms of" B. We 
define r-uses to be the transitive closure of this relation. The second restriction, then, is that an 
abstraction cannot r-use itself. 

13.5 Own Variables 

Occasionally it is desirable to have a module that retains information internally between 
invocations. Without such an ability, the information would either have to be reconstructed at 
every invocation, which can be expensive (and may even be impossible if the information depends 
on previous invocations), or the information would have to be passed in through arguments, which 
is undesirable because the information is then subject to uncontrolled modification in other 
modules. 

Procedures, iterators, and clusters may all retain information through the use of own variables. 
An own variable is similar to a normal variable, except that it retains its denotation from one 
routine activation to the next, including recursive activations. Syntactically, own variable 
declarations must appear immediately after the equates in a routine or cluster body; they cannot 
appear in bodies nested within statements. Own variable declarations have the form 
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own_var ::s own decl 

| own idn : type_spec :- expression 

| own decl , ... := invocation 

Note that initialization is optional. 

Own variables are created when a program begins execution, and they always start out 
uninitialized. The own variables of a routine (including cluster operations) are initialized in 
textual order as part of the first invocation of that routine, before any statements in the body of 
the routine are executed. Cluster own variables are initialized in textual order as part of the first 
invocation of the first cluster operation to be invoked (even if the operation does not use the own 
variables). Cluster own variables are initialized before any operation own variables are initialized. 

Aside from the placement of their declarations, the time of their initialization, and the lifetime 
of their denotations, own variables act just like normal variables and can be used in all the same 
places. As for normal variables, attempts to use uninitialized own variables (if not detected at 
compile-time) cause the run-time exception 
failureCuninitialized variable") 

Own variable declarations in different modules always refer to distinct own variables, and 
distinct executions of programs never share own variables (even if the same module is used in 
several programs). Furthermore, own variable declarations within a parameterized module produce 
distinct own variables for each instantiation of the module. For a given instantiation of a 
parameterized cluster, all instantiations of the type's operations share the same set of cluster own 
variables, but distinct instantiations of parameterized operations have distinct routine own 
variables. For example, in the following cluster there is a distinct x and y for every type t, and a 
distinct z for every type-integer pair (f, 0: 
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C - cluster tt: type) Is ... 

own x: int :- init(...) * 2 

P - proc (...) 
own y: ... 

endP 

Q,- proc [ I: Inti (...) 
own v. ... 

endQ, 

endC 
Own variable declarations cannot be enclosed by an except statement, so care must be 
exercised when writing initialization expressions. If an exception is raised by an initialization 
expression, it will be treated as an exception raised, but not handled, in the body of the routine 
whose invocation caused the initialization to be attempted. This routine will then signal failure to 
its caller (see Section 12.2). In the example cluster above, if procedure P were the first operation of 
C[ string] to be invoked, causing initialization of x to be attempted, then an overflow exception 
raised in the initialization of x would result in P signalling 

failure("unhandled exception: overflow") 
to its caller. 

Remarks 

Own variables are often useful in declaring "constants" that are either derived from 

complicated computations or are otherwise illegal in equates. In almost all such cases, the 

initialization can be attached directly to the declaration. For example, 

own flip: complex := complex$create(0.0, 1.0) 
own primes: sequencer. Inti := table_of_primes() 

However, the data denoted by own variables may also change dynamically, and may contain history 
information, as the following (fairly useless) module demonstrates: 
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delayer - proc It: type, delay: inti (x: t) returns (t) signals (not.yet) 
at = arraytt] 
own oldies: at :« at$new() 

at$addh(oldies, x) X add to waiting list 

if at$size(oldies) > delay X if delayed long enough 

then oldies.low > 1 % prevent eventual overflow 

returrKat$reml(oldies» X remove and return oldest 

else signal not.yet 

end 
end delayer 

When cluster own variable initialization involves lengthy computations, one own variable can 

be initialized with an (internal) operation call, and the body of that operation can assign values 

directly to the other own variables: 

C - cluster is ... 

own x: table :- ownJnitO 
own y: table 

ownjnit - proc returns (table) 



return*...) 
end ownjnit 

endC 

On occasion, when a particular program is known to use exactly one object of a particular 
user-defined type, it is tempting to implement the type such that the sole object is denoted by a 
cluster own variable. In this way, the object need not be passed as an argument to the various 
routines in the computation, many of which do not even use the object directly. This is a poor 
design decision in most cases, because the ways in which the type can be used later are then 
severely restricted. For example, the type cannot then be used in any program requiring several 
objects of that type. It is usually better to design types in as general a manner as possible. 

With the introduction of own variables, procedures and iterators become potentially mutable 
objects. If the abstract behavior of a routine depends on history information (as does delayer 
above), then care must be exercised to guarantee that the routine is used correctly in other modules. 
(Ideally, a CLU system should have some method of controlling access to routines.) In general, own 
variables should not be used to modify the abstract behavior of a module. 
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Appendix t - Syntax 



We use an extended BNF grammar to define the syntax. The general form of a production is: 

nonterminal ::s alternative 
| alternative 

| alternative 
The following extensions are used: 

a , ... a list of one or more o's separated by commas: "a" or "a, a" or 

M N 

a, a, a etc. 
{a} a sequence of zero or more a's: " " or "a" or "a a" etc 

[a] an optional a: " " or "a". 

All semicolons are optional in CLU, but for simplicity they appear in the syntax as ";" rather 
than "[;]". Nonterminal symbols appear in normal face. Reserved words appear in bold face. All 
other terminal symbols are non-alphabetic, and appear in normal face. 



module 



procedure 

iterator 

cluster 

parms 
parm 

args 



::s { equate j procedure 

| { equate } iterator 

| { equate } cluster 

::s idn - proc [ parms ] args [ returns J [ signals J [ where J ; 
routine.body 
end idn ; 

::s idn - iter [ parms ] args [ yields ] [ signals J [ where J ; 
routine_body 
end idn ; 

::s idn - cluster [ parms J is idn , ... [ where J ; 
cluster.body 
end idn ; 

::s [ parm , ... ] 

::s idn , ... : type 

| idn , ... : typejspec 

::s < [ decl ,...]) 
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decl 

returns 

yields 

signals 

exception 

where 

restriction 

type_set 



idn , ... : type_spec 

returns ( type_spec . ... ) 

yields ( type_spec , ... ) 

signals ( exception , ... ) 

name [ ( type_spec ,...)] 

where restriction , ... 

idn has oper.decl , ... 
idn in type_set 

{ idn I idn has oper.decl ,...;{ equate } ) 
idn 



oper_decl 


::s 


op_name , ... : type_$pec 


op_name 


::s 


name [ [ constant ,...]] 


constant 


T 


expression 
typejpec 


routine_body 




{ equate } 
{ own_var } 
{ statement } 


cluster'_body 




{ equate } rep - type_$pec ; { equate } 
{ own_var } 
routine { routine } 


routine 


T 


procedure 
iterator 


equate 


i 


idn - constant ; 
idn - type_set ; 


own_var 


::= 


own decl ; 




1 
1 


own idn : type_spec :- expression ; 
own decl , ... :- invocation ; 
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type_$pec 



null 

bool 

int 

real 

char 

string 

any 

rep 

cvt 

array I type_spec ] 

sequence [ type_spec ] 

record [ field _spec , ... J 

struct [ f ield_spcc , ... ] 

oneof [ f ield.spec , ... ] 

variant t f ield_spec , ... ] 

proctype ( [ type_spec , ... J ) [ returns J [ signals J 

itertype ( [ type_spec , ... J ) [ yields J [ signals J 

idn [ constant , ... ] 
idn 



field _spec 



:s name , ... : typejspec 
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statement 



tag_arm 



= decl; 

idn : type_spec :- expression ; 

decl , ... :- invocation ; 

idn , ... :- invocation ; 

idn , ... :- expression , ... ; 
primary . name :- expression ; 
primary I expression J :» expression ; 
invocation ; 
while expression do body end ; 

for [ decl , ... J In invocation do body end ; 

for [ idn , ... J in invocation do body end ; 
if expression then body 

{ elseif expression then body } 

[ else body J 
end; 
| tagcase expression 

tag_arm { tag.arm } 

[ others : body ] 
end; 

return [ ( expression ....)]; 

yield [ ( expression ,...)]; 

signal name [ ( expression ,...)]; 

exit name [ < expression ,...)]; 
break ; 
continue ; 
begin body end ; 

statement resignal name , ... 
statement except { whenjiandler } 

[ othersjiandler J 
end ; 

s tag name ,...[( idn : type_spec ) ] : body 
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when.handler ::s when name ,...[( decl , ... )] : body 

| when name ....(*): body 
others_handler ::s others [ ( idn : type_spec ) 1 : body 



body 



expression 



••SB 



{ equate } 

{ statement } 

::s primary 

| ( expression ) 

| ~ expression 

| - expression 

| expression *t expression 

| expression // expression 

| expression / expression 

| expression • expression 

| expression fl expression 

| expression + expression 
I 



expression - expression 
expression < expression 
expression <- expression 
expression - expression 
expression >- expression 
expression > expression 
expression ~< expression 
expression ~<« expression 
expression ~« expression 
expression ~>- expression 
expression ~> expression 
| expression & expression 
| expression cand expression 
| expression I expression 
| expression cor expression 



6 (precedence) 
6 
5 
4 
4 
4 
3 
3 
3 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
1 
1 
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primary 



::x nil 

| true 

| false 

| intjiteral 

| realjiteral 

| charjiteral 

| stringjiteral 

| idn 

| idn [ constant , ... ] 

| primary . name 

| primary [ expression 1 

| invocation 

| type_spec $ { field . ... ) 

| type_spec $ [ [ expression : ] [ expression ,...]] 

| type_spec $ name [ [ constant ,...]] 

| force [ typejspec 1 

| up ( expression ) 

| down ( expression > 

invocation ::= primary < [ expression ,...]> 

field ::= name , ... : expression 



Reserved word: one of the identifiers appearing in bold face in the syntax. Upper and lower 
case letters are not distinguished in reserved words. 

Name, idn: a sequence of letters, digits, and underscores that begins with a letter or underscore, 
and that is not a reserved word. Upper and lower case letters are not distinguished in names and 
idns. 

IntJiiteral: a sequence of one or more decimal digits. 

Realjiteral: a mantissa with an (optional) exponent. A mantissa is either a sequence of one or 
more decimal digits, or two sequences (one of which may be empty) joined by a period. The 
mantissa must contain at least one digit. An exponent is 'E* or V. optionally followed by V or •-'. 
followed by one or more decimal digits. An exponent is required if the mantissa does not contain a 
period. 
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CharJiteral: either a printing ASCII character (octal value 40 thru 176), other than single quote 

or backslash, enclosed in single quotes, or one of the following escape characters enclosed in single 

quotes: 

escape sequence character 

V ' (single quote) 

\" " (double quote) 

W \ (backslash) 

\n NL (newline) 

\t HT (horizontal tab) 

\p FF (newpage) 

\b BS (backspace) 

\r CR (carriage return) 

\v VT (vertical tab) 

\*** specified by octal value (exactly three octal digits) 

The escape sequences may be written using upper case letters. 

String_Uteral: a sequence of zero or more character representations, enclosed in double quotes. 
A character representation is either a printing ASCII character other than double quote or 
backslash, or one of the escape sequences listed above. 

Comment: a sequence of characters that begins with a percent sign, ends with a newline 
character, and contains only printing ASCII characters and horizontal tabs in between. 

Separator: a blank character (space, vertical tab, horizontal tab, carriage return, newline, form 
feed) or a comment. Zero or more separators may appear between any two tokens, except that at 
least one separator is required between any two adjacent non-self-terminating tokens: reserved 
words, identifiers, integer literals, and real literals. 
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Appendix II - Built-in Types and Type Generators 

The following sections describe the built-in types and the types produced by the built-in type 
generators. For each type, the objects of the type are characterized, and all operations of the type 
are defined (with the exception of the encode and decode operations, which are defined in 

Appendix III, Section 6). 

In defining an operation, argl, arg2. etc., refer to the arguments <the objects, not the syntactic 
expressions), and res refers to the result of the operation. If execution of an operation terminates 
in an exception, we say the exception "occurs". By convention, the order in which exceptions are 
listed in the operation type is the order in which the various conditions are checked. 

The definition of an operation consists of an interface specification and an explanation of the 
relation between arguments and results. An interface specification has the form 

name: type_spec side_effects 

restrictions 

If side.e/fects is null, no side-effects can occur. "PSE" (primary side-effect) indicates that the state 
of argl may change. "SSE" (secondary side-effect) indicates that a state change may occur in some 
object that is contained in an argument. 1 Restrictions, if present, is either a standard where 
clause, or a clause of the form 

where each T f has operjdeclj 
which is an abbreviation for 

where T, has oper.decl,, .... T n has oper_decl n 
Arithmetic expressions and comparisons used in defining operations are to be computed over 
the domain of mathematical integers or the domain of mathematical reals; the particular domain 

will be clear from context. 

Definitions of several of the types will involve tuples. A tuple is written <e, e n >; t t is 

called the i ,h element. A tuple with n elements is called an n-tuple. We define the following 
functions on tuples: 



1 For operations of the built-in types, secondary side-effects occur when a subsidiary abstraction 
performs unwanted side-effects. For example, side-effects are not expected when 
arrayCT]$similar calls TSsimilar. but their absence cannot be guaranteed. 
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Size(<e| e B >) ■ n 

A - B > (Size(A) - Size(B)) a (Vi I l^iztKAWa, - b} 
<a, .... b> H <c, .... d> ■ <a, .... b, c, ..., d> 

Front(<a b, c>) ■ <a b>. 

Tail(<a, b, .... c>) ■ <b, .... c> 

Tai1°(A) ■ A and Tail- '(A) • TaiKTaiWA)) 

Occurs( A, B. ■ (3C,D)[(B - C II A H D) a (Size<C) - i - 1)1 

If Occurs(A, B, i) holds, we say that A occurs in B at index i. 

11.1. Null 

There is one immutable object of type null, denoted nH. 

equal: proctype (null, null) returns (booD 

similar: proctype (nuH, nulO returns (boot) 

Both operations always return true. 

copy: proctype (nulD returns (nuM) 

Copy always returns nH 

11.2. Bool 

There are two immutable objects of type boot, denoted true and false. These objects 
represent logical truth values. 

and: proctype (bool, bool) returns (booD 

or: proctype (boot, bool) returns (booD 

not: proctype (bool) returns (bool) 

These are the standard logical operations. 

equal: proctype (bool, bool) returns (booO 
similar: proctype (bod, booD returns (booD 

These two operations return true if and only if both arguments are the same object 

copy: proctype (booD returns (boot) 

Copy simply returns its argument 
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II.3. Int 

Objects of type int are immutable, and are intended to model the mathematical integers. 
However, the only restriction placed on an implementation is that some closed interval 
[Int Min, Int Max] be represented, with Int Min < and Int. Max > 0. An overflow exception 
is signalled by an operation if the result of that operation would lie outside this interval. 

add: proctype (int, int) returns (int) signals (overflow) 

sub: proctype (int, int) returns (int) signals (overflow) 

mul: proctype (Int, int) returns (int) signals (overflow) 

The standard integer addition, subtraction, and multiplication operations. 

minus: proctype (int) returns (int) signals (overflow) 
Minus returns the negative of its argument. 

div: proctype (int. int) returns (int) signals (zerojdivide, overflow) 

Div computes the integer quotient of argl and argl: 

3r [(0 < r < \arg2\) f\ (argl - arg2*res * r)l 
Zero_divide occurs if argl - 0. 

mod: proctype (int. int) returns (int) signals (zero.divide, overflow) 

Mod computes the integer remainder of dividing argl by arg2. That is, 

3q t(0 < res < \arg2\) f\ (argl - arg2*<\ *■ r«)] 
Zero_divide occurs if arg2 - 0. 

power: proctype (int. Int) returns (int) signals (negativejexponent, overflow) 

This operation computes argl raised to the argl power. PowertO, 0) ■ 1. 
Negative_exponent occurs if argl < 0. 

from_to_by: itertype (int, int, Int) yields (int) 

This iterator yields, in succession, argl, argl + argl, argl + 2*argl, etc., as long as the 
value to yield, x, satisfies x < argl when argl > 0, or argl i x when argl < 0. The 
iterator continually yields argl if argl - 0. The iterator yields nothing when 
(argl > argl) a (argl > 0) or when (argl < argl) f\ (argl < 0). 

fromjo: itertype (int int) yields (int) 

from Jo(argl, argl) is equivalent to from_to_by(ar^/, argl, 1). 
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parse: proctype (string) returns (Int) signals (badjormat. overflow) 

This operation computes the exact value corresponding to an integer literal. The 
argument must be an integer literal, with an optional leading plus or minus sign. 
Badjormat occurs if the argument is not of this form. 

unparse: proctype (int) returns (string) 

Unparse produces an integer literal such that parsefunparsetarg/)) - argl. Leading 
zeros are suppressed, and no leading plus sign is added for positive integers. 

It: proctype (int int) returns (boot) 

le: proctype (int. int) returns (boot) 

ge: proctype (int int) returns (bool) 

gt: proctype (int int) returns (bool) 

The standard ordering relations. 

equal: proctype (int int) returns (bool) 

similar: proctype (int int) returns (bool) 

These two operations return true if and only if both arguments are the same object. 

copy: proctype (Int) returns (Int) 

Copy simply returns its argument. 

II.4. Real 

Objects of type real are immutable, and are intended to model the mathematical real numbers. 
However, only a subset of 

D - [-Real. Max. -Real Mini U (0) U [Real Min, Real Max] 
need be represented, where < Real. Min < 1 < Real Max. Call this subset Real. We require that 
both and 1 be elements of Real. If the exact value of a real literal lies in D, then the value in 
CLU is given by a function Approx, which satisfies the following axioms: 

V r € D Approx(r) c Real 

V r c Real Approx(r) - r 

V r € D - 10) |(Approx(r) - r)/r| < lO^P 

V r,s c D r < s -» Approx(r) < Approx(s) 

V r c D Approx(-r) - -Approx(r) 

The constant p is the precision of the approximation, and must be at least 7. 

We define Max width and Exp_width to be the smallest integers such that every non-zero 
element of Real can be represented in "standard" form (exactly one digit, not zero, before the 
decimal point) with no more than Max. width digits of mantissa and no more than Exp_width 
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digits of exponent. 

add: proctype (real, real) returns (real) signals (overflow, underflow) 

sub: proctype (real, real) returns (real) signals (overflow, underflow) 

mul: proctype (real, real) returns (real) signals (overflow, underflow) 

minus: proctype (real) returns (real) 

div: proctype (real, real) returns (real) signals (zerojdivide, overflow, underflow) 

These operations satisfy the following axioms: 

1) (a.b > v a,b < 0) -» add(a, b) - Approx(a ♦ b) 

2) add(a. b) - <1 ♦ c)<» ♦ b) |c| < K>H» 

3) add(a, 0) - a 

4) add(a, b) - add(b. a) 

5) a < a' -♦ add(a. b) < add(a\ b) 

6) minus(a) - -a 

7) sub(a, b) - add(a. -b) 

8) muKa, b) - Approx(a • b) 

9) div(a, b) = Approx(a / b) 

In axiom 2, the value of p is the same as that used in defining Approx. Note that the 
infix and prefix expressions above are computed over the mathematical real numbers. 
The axioms only hold if no exceptions occur. An exception occurs if the result of an 
exact computation lies outside of D; overflow occurs if the magnitude exceeds 
Real Max, and underflow occurs if the magnitude is less than ReaLMIn. Zero_divide 
occurs if arg2 « 0. 

power: proctype (real, real) returns (real) 

signals (zero.divide. complexj-esult, overflow, underflow) 

This operation computes argl raised to the arg2 power. Zero.divide occurs if 
(argl = 0) a (arg2 < 0). Complex .result occurs if arg I < and arg2 is non-integral. 
Overflow and underflow occur as explained above. 

i2r: proctype (int) returns (real) signals (overflow) 

I2r returns a real number corresponding to the argument: res - ApproxiargD. Overflow 
occurs if argl lies outside the domain D. 

r2i: proctype (real) returns (int) signals (overflow) 

R2i rounds to the nearest integer, and toward zero in case of a tie: 

(|r« - argl\ < 1/2) a (|r«| < \arg^ + 1/2) 
Overflow occurs if the result lies outside the domain for CLU integers. 

trunc: proctype (real) returns (Int) signals (overflow) 

Trunc truncates its argument toward zero: (|r« - argi\ < 1) a i\res\ < \argfy. Overflow 
occurs if the result lies outside the domain for CLU integers. 
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exponent: proctype (real) returns (int) signals (undefined) 

This operation returns the exponent that would be used in representing argl as a literal 
in standard form: res - maxU I \arg^ i 10*}. Undefined occurs if argl - 0.0. 

mantissa: proctype (real) returns (real) 

This operation returns the mantissa of argl when represented in standard form: 

res - Approx(ar^ / loexponent(ar^)) 
If r - 0.0 the result is 0.0. 

parse: proctype (string) returns (real) signals (bad Jormat, overflow, underflow) 

This operation computes the exact value corresponding to a real or integer literal, and 
then returns the result of applying Approx to that value. The argument must be a real 
or integer literal, with an optional leading plus or minus sign. Bad_f ormat occurs if the 
argument is not of this form. Overflow occurs if the magnitude of the exact value of 
the literal exceeds Real Max; underflow occurs if the magnitude is less than Real_Mln. 

unparse: proctype (real) returns (string) 

Unparse produces a real literal such that parsefunparseterf/)) - argl. The general form 
of the literal is: 

[-\_field.f_field[e±x_field] 
Leading zeros in i_field and trailing zeros in J Lfield are suppressed. If argl is integral 
and within the range of CLU integers, then jlfield and the exponent are not present. 
If argl can be represented by a mantissa of no more than Max_ width digits and no 
exponent (i.e., -1 < exponent(arg/) < Max_width), then the exponent Is not present. 
Otherwise, the literal is in standard form, with Exp. width digits of exponent 



It: 
le: 
ge: 

gf- 



equal: 
similar: 



copy. 



proctype (real, real) returns (bool) 
proctype (real, real) returns (bool) 
proctype (real, real) returns (bool) 
proctype (real, real) returns (bool) 

The standard ordering relations. 

proctype (real, real) returns (bool) 
proctype (real, real) returns (bool) 

These two operations return true if and only if both arguments are the same object 

proctype (real) returns (real) 
Copy simply returns its argument. 
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II.5. Char 

Objects of type char are immutable, and represent characters. Every implementation must 
provide at least 128, but no more than 512, characters. Characters are numbered from to some 
Char_Top, and this numbering defines the ordering for the type. The first 128 characters are the 
ASCII characters in their standard order. 

i2c: proctype (int) returns (char) signals (illegaljchar) 

I2c returns the character corresponding to the argument. Illegal_char occurs if the 
argument is not in the range [0, Char Top]. 

c2i: proctype (char) returns (int) 

This operation returns the number corresponding to the argument 

It: proctype (char, char) returns (bool) 

le: proctype (char, char) returns (bool) 

ge: proctype (char, char) returns (bool) 

gt: proctype (char, char) returns (bool) 

The ordering relations consistent with the numbering of characters. 

equal: proctype (char, char) returns (bool) 
similar: proctype (char, char) returns (boot) 

These two operations return true if and only if the two arguments are the same object. 

copy: proctype (char) returns (char) 

Copy simply returns its argument. 

II .6. String 

Objects of type string are immutable. Each string represents a tuple of characters. The i Xh 
character of the string is the i' h element of the tuple. There are an infinite number of strings, but 
an implementation need only support a finite number. Attempts to construct illegal strings result in 
a failure exception. 

size: proctype (string) returns (int) 

This operation simply returns the size of the tuple represented by the argument. 

empty: proctype (string) returns (bool) 

This operation returns true if and only if m&argl) . o. 



106 



String 



$11.6 



Index s: proctype (string, string) returns (int) 

If argl occurs in argl, this operation returns the least index at which argl occurs: 

res - minh I Occurs(arg7, argl, i» 
Note that the result is 1 if argl is the 0-tuple. The result is if argl does not occur. 

indexc: proctype (char, string) returns (int) 

If <argl> occurs in arg2, the result is the least index at which <argl> occurs: 

res - minU I Occurs( <argl>, argl, i)) 
The result is if <argl> does not occur. 

c2s: proctype (char) returns (string) 

This operation returns the string representing the 1-tuple <argl>. 

concat: proctype (string, string) returns (string) 

Concat returns the string representing the tuple argl II argl. 

append: proctype (string, char) returns (string) 

This operation returns the string representing the tuple argl 9 <argl>. 

fetch: proctype (string, int) returns (char) signals (bounds) 

Fetch returns the argl ih character of argl. Bounds occurs if argl < 1 or 
argl > siieiargl). 

rest: proctype (string, int) returns (string) signals (bounds) 

The result of this operation is Tail w,z ' '<«£/>. Bounds occurs if argl < 1 or 
argl > siidargl) + 1. 

substr: proctype (string, int. int) returns (string) signals (bounds, negative_size) 

If arg3 < size(rest(crg7, argl)), the result is the string representing the tuple of size argS 
which occurs in argl at index argl. Otherwise, the result is restfarg/, arg2). Bounds 
occurs if argl < 1 or argl > s\ie(argl) ♦ 1. Negattve_size occurs if org) < 0. 

s2ac: proctype (string) returns (array! char]) 

This operation places the characters of the argument as elements of a new array of 
characters. The low bound of the array is 1, and the size of the array is siidargfi. The 
i ,h element of the array is the i' h character of the string. 

ac2s: proctype (array! char]) returns (string)) 

Ac2s serves as the inverse of s2ac. The result is the string with characters in the same 
order as in the argument. That is. the i lh character of the result is the 
(i ♦ \ovt(argl) - l) ,h element of the argument. 
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S2SC: 



sc2s: 



chars: 



It: 
le: 
ge: 



equal: 
similar: 



copy: 



proctype (string) returns (sequenceCcharl) 

This operation transforms a string into a sequence of characters. The size of the 
sequence is sizetorg/). The i th element of the sequence is the i ,k character of the string. 

proctype (sequenceCcharl) returns (string) 

Sc2s serves as the inverse of s2sc. The result is the string with characters in the same 
order as in the argument. That is, the i"* character of the result is the i th element of the 
argument. 

itertype (string) yields (char) 

This iterator yields, in order, each character of the argument. 

proctype (string, string) returns (bool) 
proctype (string, string) returns (bool) 
proctype (string, string) returns (bool) 
proctype (string, string) returns (bool) 

These are the usual lexicographic orderings based on the ordering for characters. The 
It operation is equivalent to the following: 

It * proc (x, y: string) returns (bool) 
size_x: int :- string$size(x) 
size_y: int :- string$size(y) 
min: int 
if sizejc <« size_y 

then min :- sizejc 

else min :- size_y 

end 
for i: int in intSf romjod, min) do 

if xtil ~- ytil then return(xti] < ytil) end 

end 
returrKsizejc < size_y) 
end It 

proctype (string, string) returns (bool) 
proctype (string, string) returns (bool) 

These two operations return true if and only if both arguments are the same object. 

proctype (string) returns (string) 
Copy simply returns its argument. 
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II. 7. Array Types 



The array type generator defines an infinite class of types. For every type T there is a type 
arrayCn. Arrays are mutable objects. The state 1 of an object of type arraytT] consists of: 

a) an integer Low, called the low bound, and 

b) a tuple Elts of objects of type T, called the elements. 

We also define Size * SizelElts). and High * Low ♦ Size - 1. We want to think of the elements of 
Elts as being numbered from Low, so we define the array Jndex of the i th element to be 
<i ♦ Low - 1). 

For any array. Low, High, and Size must be legal integers. Any attempts to create or modify 
an array in violation of this rule results in a failure exception. Note that for all array operations, 
if an exception other than failure occurs, the states of all array arguments are unchanged from 
those at the time of invocation. 

create: proctype (int) returns (arrayCT]) 

This operation returns a new array for which Low is argl and Elts is the 0-tuple. 

new: proctype returns (arrayCT]) 

This is equivalent to created). 

predict: proctype (Int, Int) returns (arrayCT]) 

Predict is essentially the same as createfarjT), in that it returns a new array for which 
Low is argl and Elts is the 0-tuple. However, if argl is greater than (less than) 0, it is 
assumed that at least \arg2\ addh's (addl's) will be performed on the array. These 
subsequent operations may execute somewhat faster. 

low: proctype (arrayCT]) returns (Int) 

high: proctype (arrayCT]) returns (Int) 

size: proctype (arrayCT]) returns (Int) 

These operations return Low, High, and Size, respectively. 

empty: proctype (arraytTJ) returns (booP 

This operation returns true if and only if Size - 0. 



1. For an array A, we should properly write Low A , etc., to refer to the state of that particular 
object, but subscripts will be dropped when the association seems clear. 
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setjow: proctype (arrayfT], int) 

Setjow makes Low equal to arg2. 



PSE 



trim: 



proctype (arrayfT], int, Int) signals (bounds, negative_$»ze) PSE 

This operation makes Low equal to arg2, and makes Elts equal to the tuple of sire 
mlniarg?, High' - arg2 + 1) which occurs in Elts* at index arg2 - Low' ♦ l. 1 That is, 
every element with array Jndex less than arg2, or greater than or equal to arg2 ♦ argi, 
is removed. Bounds occurs if arg2 < Low* or arg2 > High' ♦ 1. Negative_size occurs if 
argl < 0. Note that this operation is somewhat like string$substr. 

fill: proctype (int int. T> returns (arrayfT]) signals (negative_size) 

Fill creates a new array for which Low is argl and Elts is an or^2-tuple in which every 
element is argl. Negative.size occurs if arg2 < 0. 

f ill.copy: proctype (Int int T) returns (arrayfT]) signals (negative_size) SSE 

where T has copy: proctype (T) returns <T> 

This operation is equivalent to the following: 

f ill.copy - proc (nlow, nsize: int elt: T) returns (at) signals (negative^stze) 

where T has copy: proctype <T> returns (T) 
at - arrayfT] 

If nsize < then signal negativejsize end 
x: at :- at$predict(nlow, nsize) 
for i: int in Inttf romjod. nsize) do 
at$addh(x, T$copy(eh)) 
end 
return(x) 
end f ill.copy 

fetch: proctype < arrayfT], int) returns (T) signals (bounds) 

Fetch returns the element of argl with arrayjndex arg2. Bounds occurs if argl < Low 
or arg2 > High. 

bottom: proctype (arrayfT]) returns (T) signals (bounds) 
top: proctype (arrayfT]) returns (T) signals (bounds) 

These operations return the elements with arrayjndexes Low and High, respectively. 
Bounds occurs if Size - 0. 



store: proctype (arrayfT], int T) signals (bounds) 



PSE 



Store makes Elts a new tuple which differs from the old in that argl is the element 
with arrayjndex arg2. Bounds occurs if arg2 < Low or arg2 > High. 



1. Elts*, High', etc. refer to the state just prior to invoking the operation. 
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addh: proctype <array(T], T) PSE 

This operation makes Elts the new tuple Elts' N <arg2>. 

addl: proctype ( array! T], T) PSE 

This operation makes Low equal to Low' - 1, and makes Elts the tuple <arg2> N Elts*. 
Decrementing Low keeps the arrayjndexes of the previous elements the same. 

remh: proctype (arraytTl) returns <T) signals (bounds) PSE 

Remh makes Elts the tuple FronKEIts*), and returns the deleted element. Bounds occurs 
if Site' - 0. 

remt: proctype (arraytTl) returns (T) signals (bounds) PSE 

Reml makes Low equal to Low' + 1, makes Elts the tuple TaiKEIts*), and returns the 
deleted element. Incrementing Low keeps the arrayjndexes of the remaining elements 
the same. Bounds occurs if Size' - 0. 

elements: itertype (arraytTl) yields (T) 

This iterator is equivalent to the following: 

elements - iter (x: at) yields (T) 
at - arraytTl 
for i: int In inttf rom_to<at$low(x), at$hlgh(x)) do 

yleWKxtil) 

end 
end elements 

indexes: itertype (arraytTl) yields (int) 

This iterator Is equivalent to int$from_to(Low', High*). 

equal: proctype (arraytTl, arraytTl) returns (booD 

Equal returns true if and only if both arguments are the same object. 
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similar: 



similarl: 



copyl: 



copy: 



proctype (arraylTl, arraytTl) returns (bool) 

where T has similar: proctype (T, T) returns (booD 

This operation is equivalent to the following: 

similar - proc (x, y: at) returns (bool) 

where T has similar: proctype (T, T) returns (booD 
at > arraylTl 
if at$low(x) ~« atSlow<y) cor at$size(x) ~- at$size(y) 

then returnt false) 

end 
for i: int In at$indexes(x) do 

if ~TSsimilar(xlil, ylil) then return(false) end 
• end 
return* true) 
end similar 



SSE 



SSE 



proctype ( arraylTl, arraylTl) returns (booO 

where T has equal: proctype (T. T) returns (booO 

Similarl works in the same way as similar, except that TSequal is used Instead of 
TJsimilar. 

proctype (arraylTl) returns (arraylTl) 

Copyl creates a new array with the same state as the argument. 



proctype (arraylTl) returns (arraylTl) 

where T has copy: proctype (T) returns (T) 

This operation is equivalent to the following: 

copy - proc (x: at) returns (at) where T has copy: proctype (T) returns <T> 
at > arraylTl 
x :- atScopyl(x) 
for i: int in atSindexes(x) do 

xtil :- T$copy(xlil) 

end 
return(x) 
end copy 



SSE 



II .8. Sequence Types 

The sequence type generator defines an infinite class of types. For every type T there is a 
type sequenced"]. An object of type sequencelTl consists of a tuple, Elts. of objects of type T, 
called the elements of the sequence. Sequences are immutable objects: a particular sequence always 
represents exactly the same tuple of objects. However, if the objects in the tuple are mutable, then 
the state of those objects may change. 
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For convenience, we define Size - Size(Elts). The elements of a sequence are numbered from 1 
to Size. For any sequence, Size must be a legal integer; any attempt to create a sequence that 
violates this rule results in a failure exception. 

new: proctype returns (sequenceCT]) 

This operation returns the empty sequence. 

size: proctype (sequenceCT]) returns (int) 

This operation returns Size. 

empty: proctype (sequenced]) returns (booD 
Empty returns true if and only if Size - 0. 

subseq: proctype (sequenceCT], int, int) returns (sequenceCT]) 

signals (bounds, negativejsize) 

If argl < Size - argl ♦ 1 then the result is the tuple of size arg3 occurring in argl 
starting at index argl. Otherwise, the result is the tuple Tail"* 2 " Hargl). Bounds occurs 
if argl < 1 or argl > Size + 1. Negative_size occurs if argl < 0. 

fill: proctype (int. T) returns (sequenceCT]) signals (negative_size) 

Fill returns the sequence for which Elts is the arg/-tuple in which every element is argl. 
Negative_size occurs if argl < 0. 

f ilLcopy: proctype (int, T) returns (sequenceCT!) signals (negative_size) SSE 

This operation is equivalent to the following: 

f ilLcopy - proc (nsize: int, elt: T) returns (qt) signals (negative_size) 

where T has copy: proctyjpe (T) returns (T) 
qt - sequenceCT] 

if nsize < then signal negative_>ize end 
x: qt :- qt$new() 
for i: int in int$fromjo(l, nsize) do 

x :- qtSaddMx, T$copy(eb» 

end 
return(x) 
end f ill.copy 

fetch: proctype (sequenceCT], Int) returns (T) signals (bounds) 

Fetch returns the arg2 th element of argl. Bounds occurs if arg2 < I or argl > Size. 
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bottom: proctype (sequenced"]) returns <T> signals (bounds) 
top: proctype (sequenced"]) returns (T) signals (bounds) 

These operations return the first and last elements of argl, respectively. Bounds occurs 
if Size - 0. 

replace: proctype (sequenced], Int. T) returns (sequenced]) signals (bounds) 

This operation returns a new sequence whose argl xh element is arg3, but which is 
otherwise the same as argl. Bounds occurs if argl < 1 or argl > Sixe. 

addh: proctype (sequenced], T) returns (sequenced!) 

Addh returns the sequence representing the tuple Elts II <argl>. 

addl: proctype (sequenced], T) returns (sequenced!) 

Add I returns the sequence representing the tuple <argl> II Elts. 

remh: proctype (sequenced!) returns (sequenced!) signals (bounds) 

Remh returns the sequence representing the tuple Front(Elts). Bounds occurs if 
Size - 0. 

reml: proctype (sequenced!) returns (sequenced!) signals (bounds) 

Reml returns the sequence representing the tuple TaiKEIts). Bounds occurs if Size - 0. 

e2s: proctype (T) returns (sequenced!) 

This operation returns the sequence representing the singleton tuple <argl>. 

concat: proctype (sequenced], sequenced!) returns (sequenced!) 
Concat returns the sequence representing the tuple argl It argl. 

a2s: proctype (arraydl) returns (sequenced!) 

This operation returns the tuple corresponding to the elements part of the state of argl. 

s2a: proctype (sequenced!) returns (arrayCT!) 

This operation returns a new array with low bound 1 and with Elts as the elements part 
of the array state. 

elements: itertype (sequenced!) yields (T> 

This iterator yields, in order, each element of Elts. 

indexes: itertype (sequenced!) yields (int) 

This iterator is equivalent to inttfromjod, Size). 
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equal: proctype (sequenced], sequenced]) returns <booD SSE 

where T has equal: proctype (T, T) returns (boot 

Equal is equivalent to the following: 

equal - proc (x, y: qt) returns (boot) 

where T has similar proctype (T, T) returns (boo© 
qt - sequenced*] 

if qt$size(x) — qt$size(y) then returnff atse) end 
for i: int hi qt$indexes<x) do 

If xtil — ylil then returntfalse) end 
end 
return(true> 
end equal 

similar: proctype (sequenced*], sequenced]) returns (booD SSE 

where T has similar: proctype (T, T) returns (booO 

Similar works in the same way as equal, except that TSstmuar is used instead of 
TSequal. 

copy: proctype (sequenced*]) returns (sequenced*]) SSE 

where T has copy: proctype (T) returns (D 

This operation is equivalent to the following: 

copy - proc (x: qt) returns (qt) where T has copy, proctype (T) returns <T) 
qt - sequenced*] 
y: qt :- qtSnewO 
f or e: T In qt$dements(x) do 
y :- qttaddMy, T$copy(e)) 



return(y) 
end copy 

II.9. Record Types 

The record type generator defines an infinite class of types. For every tuple of name/type 

pairs <(N,, T,) (N„, T l) )>, where aN the names are distinct, in lower case, and in lexicographic 

order, there is a type recorcCN^T, N^TjI. (However the user may write this type with the 

pairs permuted, and may use upper case letters in names.) Records are mutable objects. The state 
of a record of type record N t iT v .... N^T,] is an n-tupte; the i* element of the tuple is of type T r 
The i tk element is also called the ^-component. 
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create: proctype <T, T B > returns (recorotN,:T, N B :T n ]) 

This operation returns a new record with the tuple <argl, .... argN> as its state. This 
operation is not available to the user; its use is implicit in the record constructor (see 
Section 10.6). 

get_N ; : proctype (recordCN ,:T, N B :T n l> returns (T ; ) 

This operation returns the N r component of the argument. There is a getj^ operation 
for each N r 

set_Nj : proctype (recordtN ,:T, N B :T n l. T ; ) PSE 

This operation makes the state of argl a new tuple which differs from the old in that 
the N r component is argl. There is a setJ^T operation for each N r 

equal: proctype (recordtN ,:T, N B :T B ], recordCN ,:T, N B :T n 3) returns <bool> 

Equal returns true if and only if both arguments are the same object. 

similar: proctype (recordtN ,:T, N n :T n ], recordtN,:T, N„:T n ]> returns (bool) SSE 

where each T f has similar: proctype (T ? T) returns (bool) 

Corresponding components of argl and arg2 are compared in (lexicographic) order, 
using TSsimilar for the Nj-components. (The Nj-component of argl becomes the first 
argument.) If a comparison results in false, the result of the operation is false, and no 
further comparisons are made. If all comparisons return true, the result is true. 

similarl: proctype (recordtN ,:T, N B :T n ], recordtN ,:T, N B :T B ]) returns (booD SSE 

where each T ( has equal: proctype (T ? T f ) returns (bool) 

Similarl works in the same way as similar, except that Tjtequal is used instead of 
Tj$similar. 

copyl: proctype (recordtN ,:T, N B :T B 1) returns (recordtN ,:T,. .... N ft :T ft l) 

Copyl returns a new record with the same state as the argument 
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copy: proctype (recorOlN^T,...., N^TJ) returns (recordtN,:T,. ^N^T,,]) SSE 

where each T\ has copy: proctype (TJ returns ("ty 

This operation is equivalent to the following (note that the N 4 are in lexicographic 
order): 

copy « proc (x: rt> returns <rt) 

where T, has copy: proctype (T,) returns (T,), 

T m has copy: proctype (T # ) returns IT,) 
rt - recordTNjT,, ..^ N.:^] 
x :- rttcopyHx) 
x.N, :- T^copyix.N,) 

x.N, :- T B $copy(x.N ( ) 

returnU) 

end copy 

11.10. Structure Types 

The struct type generator defines an infinite class of types. For every tuple of name/type 
pairs <(N,, T,), .... (N B , T w >>, where aH the names are distinct, in lower case, and hi lexicographic 

order, there is a type structfNpT, N^TJ. (However the user may write this type with the 

pairs permuted, and may use upper case letters in namesJ Structures are immutable objects. A 
structure of type structtN,:T |( .... N^T,] is an n-tupte; Uie 1* element of the tuple is of type T r 
The i lk element is aho called the N^-component. 

create. proctype (T, T B > returns (strucfNpT,, ~, N^T^]) 

This operation returns the structure representing the tuple <argl,~,argN>. This 
operation is not available to the user; its use to imphctt m the structure constructor (see 
Section 10.6). 

get_N 4 : proctype (struct N ,:T,, .... N^TJ) returns (T,J 

This operation returns the Nj-component of the argument There is a get_N, operation 
for each N P 

reptoce_N 4 : proctype (structfN,:T, N^T,,], T ( ) returns (strucCNjiT,. _ N^T,,!) 

This operation returns the tuple corresponding to argl with its ^-component replaced 
by argl. There is a replacej^ operation for each N r 

s2n proctype (struct N ,:T, N^TJ) returns (reeordlN ,:T,, _ N/T,]) 

S2r returns a new record whose initial state is the tuple represented by the argument 
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r2s: proctype ( record N . :T N „:T ]> returns ( structf N . :T N :T ]) 

linn linn 

R2s returns the structure representing the tuple that is the current state of the argument. 

equal: proctype (struct£N,:T, N n :T n l, structiN,:!",, ..., N n :T n 3> returns (bool) SSE 

where each T j has equal: proctype (T f T) returns (bool) 

Corresponding components of argl and arg2 are compared in (lexicographic) order, 
using TSequal for the NT-components. (The NT-component of argl becomes the first 
argument.) If a comparison results in false, the result of the operation is false, and no 
further comparisons are made. If all comparisons return true, the result is true. 

similar: proctype (struetfN.T" N :T ], structfN,:T, N :T 1) returns (boo© SSE 

II nn ii nn 

where each T ( has similar: proctype (T p T ; ) returns (bool) 

Similar works in the same way as equal, except that Tjlsimilar is used instead of 
TjSequal. 

copy: proctype (struct! N,:T, N n :T n l) returns (structENpT,, ..„ N B :T B ]) SSE 

where each T t has copy: proctype (T ( ) returns (T ( ) 

This operation is equivalent to the following (note that the N ( are in lexicographic 
order): 

copy - proc (x: st) returns <st) 

where T, has copy: proctype (T,) returns (T,), 

T ft has copy: proctype (T B > returns (T,,) 

st - struc«N,:T, N/T,,] 

return(st${N | : TjScopyta.N,), 

N„: T.ScopjKx.N,))) 
end copy 

11.11. Oneof Types 

The oneof type generator defines an infinite class of types. For every tuple of name/type 

pairs <(N,,T,) (N n , T n )>, where all of the names are distinct, in lower case, and in 

lexicographic order, there is a type oneofTN^T, N B :T n ). (However the user may write this type 

with the pairs permuted, and may use upper case letters in names.) Oneofs are immutable objects. 
Each oneof represents a name/object pair (N^, X), where X is of type T r For each object X of 
type T; there is a oneof for the pair (N jt X). N s is called the tag of the oneof, and X is called the 
value. 
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makeJMj : proctype IT) returns (oneoflN^T,, ..„ N^T,]) 

This operation returns the oneof for the pair W f argil. There is a makejii. operation 
for each N r 

is_N, : proctype <oneo«N |: T |t .... N./T.D return* <booD 

This operation returns true if and only if the tag of the argument is N r There U an 
isJNj operation for each Nj. 

value JM 4 : proctype (oneofCN^T,, ..„ NjTJi returns (T,i signals <wrongjag) 

If the argument has tag N^ the result is the value component of the argument. 
Wrongjag occurs if the tag is other than N r There is a valuej^ operation for each 

o2v: proctype (oneoftN,:T, N,:T B 1> returns <vertaw«N,:T | ,.^N || :T 1| J> 

This operation returns a new variant with an initial state that has the same tag and 
value as the argument. 

v2a proctype (variantCN^T,. ^ N.fT.J) returns <oneofIN,:T,, .... N.rT,,]) 

This operation returns the oneof with the same tag and value as the current state of the 
argument. 

equal: proctype (oneottN f :T, N„:TJ, oneoftN^T,, .^ N^T.J) returns (boo© SSE 

where each T s has equal: proctype <T ? T,) returns (booO 

If argl and arg2 have different tags, the result is false. If both tags are N,, the result 
is that of invoking T^Seoual with the two value compo n e n ts. 

similar: proctype (oneoftN ,:T ,, ..„ N^TJ. oneoflN ,:T,, _ N^.T,)) returns <boo0 SSE 

where each T ( has similar: proctype (T f TJ returns (booO 

If argl and argl have different tags, the result is false. If both tags are N ? the result 
is that of invoking T^simibr with the two value components. 

copy: proctype <oneof[N,:T N^J) returns <oneof(N | :T | . m ,N |( :T || ]) SSE 

where each T } has copy: proctype Oy returns <T^ 

If argl represents the pair <N,X), then the result Is the oneof for the pair 
(N p TjScopyXX)). 
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11.12. Variant Types 

The variant type generator defines an infinite class of types. For every tuple of name/type 

pairs <(N,,T,) <N B , T n >>. where alt of the names are distinct, in lower case, and in 

lexicographic order, there is a type varlantfN,:T, N B :T B ]. (However the user may write this 

type with the pairs permuted, and may use upper case letters in names.) Variants are mutable 
objects. The state of a variant consists of a name/object pair (Nj, X). where X is of type T f For 
each object X of type T, there is a state (N, X). N, is called the current tag of the variant, and X 
is called the current value. 

makeJN. : proctype (T) returns (variantLNpT, N B :T B 1> 

This operation returns a new variant whose initial state is the pair (N jt argfi. There is 
a makeJ-Jj operation for each N ( . 

change_N ; : proctype (varianttN ,:T, N n :T„), T ; > PSE 

This operation changes the state of argl to be the pair (Nj. arg2). There is a change_N, 
operation for each N r 

is_Nj : proctype (varianttN ,:T, N B :T n 3> return* (boot) 

This operation returns true if and only if the current tag of the argument is N r There 
is an is_N- operation for each N ( . 

vahieJN, : proctype (variant N,:T, N„:T B 3) return* <TJ signals (wrongjag) 

If the current tag of the argument is Nj. then the current value component is returned. 
Wrongjag occurs if the current tag is other than N,. There is a valueJNj operation for 
each N r 

equal: proctype (variant N ,:T, N„:T B 3, varlantfN ,:T, N„:T B 3) returns (booO 

This operation returns true if and only if argl and argl are the same object. 

similar: proctype < variant N.T, N B :T B 3, varlantN,:T, N B :T B 3> returns (bool) SSE 

where each T ( has similar: proctype (T i( T ( ) returns (bool) 

If argl and argl have different tags, the result is false. If both tags are N jf the result 
is that of invoking Tjlsimilar with the two value components. 

similarl: proctype (varlantCN.T, N B :T,,], variantCN,^,, .... N B :T„]> returns (bool) SSE 

where each T f has equal: proctype (Tj, T ( ) returns (bool) 

If argl and argl have different tags, the result is false. If both tags are N P the result 
is that of invoking TjSequal with the two value components. 



120 Variant Types $11.12 



copy: proctype (warlantfN ,:T,. .... N B :T B ]> returns ( variant! N,:T, N^T^]) SSE 

where each T. has copy: proctype <T) returns (T) 

If the current state of the argument is <N p X), then the result is a new variant whose 
initial state is (NT, Tj$copy(X». 

copyl: proctype (varlantfN ,:T, N B :T B ]) returns ( variant! N,:T, N B :T B ]> 

If the current state of the argument is M f X), then the result is a new variant whose 
initial state is also W f X). 

11.13. Procedure and Iterator Types 

Let A, R, L, L B be ordered lists of types, and let N,, .... N B be distinct names in lower case 

and in lexicographic order. Then there is a type 

proctype (A) returns (R) signals (N.(L.) N <L )) 

II (In 

and a type 

itertype (A) yields (R) signals (N,(L,) N„(L n ». 

(The user may permute the N.(LVs, and may use upper case letters in names. If R is empty then 
"returns (R)" is not written. "(L)" is not written if L ; is empty, and "signals (...)" is not written if 
n - 0.) 

The create operations are not available to the user; routines are created by compiling modules. 

Let T be a procedure (or iterator) type in the following. 

equal: proctype (T, T> returns <bool) 
similar: proctype <T, T> returns <bool) 

These operations return true if and only if both arguments are the same 
implementation of the same abstraction, with the same parameters. 

copy: proctype <T> returns (T) 

Copy simply returns its argument 

11.14. Any 

The type any is the union of all types. There are no operations for the type any. Thus, for 
example, no arrayCanylScopy operation exists. 
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Appendix III - Input/Output 

This appendix describes a set of standard "library" data types and procedures for CLU, 
provided primarily to support I/O. We do not consider this facility to be part of the language 
proper, but felt the need for a set of commonly-used functions that have some meaning on most 
systems. This facility is minimal because we wished it to be general, l.e, to be implementable. at 
least in large part, under almost any operating system. The facility also provides a framework in 
which some other operations that are not always available can be expressed. 

Some thought was given to portability of programs, and possibly even data, but we expect that 
programs dealing with all but the simplest I/O will have to be written very carefully to be portable, 
and might not be portable no matter how careful one is. 

The following additional types are described: 

stream - provides access to text files 
istream - provides access to image files 
f ilejiame - a naming scheme for files 
date - calendar date and time 

No type "file" exists, as will be explained. 
II 1. 1. Files 

Our notion of file is a general one that includes not only storage files (disk files), but also 
terminals and other devices (e.g. tape drives). Each file will in general support only a subset of the 
operations described here. 

There are two basic kinds of files, text files and image files. The two kinds of files may be 
incompatible. However, on any particular system, it may not be possible to determine what kind a 

given file is. 

A text file consists of a sequence of characters, and is divided into lines terminated by newline 
<'\n') characters. A non-empty last line might not be terminated. By convention, the start of a new 
page is indicated by placing a newpage <V> character at the beginning of the first line of that 

P a g e 

A text file will be stored in the (most appropriate) standard text file format of the local 

operating system. As a result, certain control characters (e.g.. NUL, CR, FF, "C, ^Z) may be 

ignored when written. In addition, a system may limit the maximum length of lines and may add 
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(remove) trailing spaces to (from) lines. 

Image Files are provided to allow more efficient storage of information than is provided by 
text files. Unlike text files, there is no need for image files to be compatible with any local file 
format; thus, image files can be defined more precisely than text files. 

An image file consists of a sequence of encoded objects. Objects are written and read using 
encode and decode operations of their types. (These in turn will call encode and decode on their 
components until basic types are reached.) The objects stored in an image file are not tagged by 
the system according to their types. Thus, if a file is written by performing a specific sequence of 
encode operations, then it must be read back using the corresponding sequence of decode operations 
to be meaningful. 

III.2. File Names 

File names are immutable objects used to name files. The system file name format is viewed 

as consisting of four string components: 

directory - specifies a file directory or device 

name - the primary name of the file (e.g. "thesis") 

suffix - a name normally indicating the type of. file (e.g. "chi" for a 

CLU source file) 
other - all other components of the system file name form 

The directory and other components may have internal syntax. The name and suffix should be 

short identifiers. (For example, in the TOPS-20 file name "ps:<cluser>ref.lpL3", the directory is 

"ps:<cluser>", the name is "ref ", the suffix is "Ipt", and the other is "3". In the UNIX path name 

Vusr/snyder/doc/refman.r", the directory is Vusr/snyder/doc", the name is "refman", the suffix is 

V, and there is no other. 

A null component has the following interpretation: 

directory - denotes the current "working" directory. (For example, the 
"connected directory" on TOPS-20 and the "current directory" 
on UNIX. See also Section 8 of this appendix.) 

name - may be illegal, have a unique interpretation, or be ignored. 
(For example, on TOPS-20. a null name is illegal for most 
directories, but for some devices, the name is ignored.) 

suffix - may be illegal, have a unique interpretation, or be ignored. 
(For example, on TOPS-20, a null suffix is legal, as in 
"<rws>foo".) 



§111.2 



File Names 



125 



other - should imply a reasonable default. 
The operations on file names are: 



create: 



get_dir: 
get_name: 
get_suffix: 
get_other: 



parse: 



proctype (string, string, string, string) returns (f ilejiame) 

signals (bad Jormat) 

This operation creates a file name from its components. Argl is the directory part, 
argl is the name part, arg) is the suffix part, and arg4 is the other part for the new 
filename. In the process of creating a file name, the string arguments may be 
transformed, e.g. by truncation or case-conversion. 

proctype (f ilejiame) returns (string) 
proctype (filename) returns (string) 
proctype (filename) returns (string) 
proctype (f ilejiame) returns (string) 

These operations return string forms of the components of a file name. If the file 
name was created using the create operation, then the strings returned may be 
different than those given as arguments to create, e.g., they may be truncated or 
case-converted. 

proctype (string) returns (f ilejiame) signals (bad Jormat) 

This operation creates a file name given a string in the system standard file name 

syntax. 



unparse: proctype (filejiame) returns (string) 

This operation transforms a file name into the system standard file name syntax. 
We require that 

parse(unparse(fn)) - fn 

create(fn.dir, fn.name. fn.suffix, fn.other) - fn 
for all file names fn. One implication of this rule is that there can be no file name 
that can be created by create but not by parse; if a system does have file names that 
have no string representation in the system standard file name syntax, then create 
must reject those file names as having a bad format. Alternatively, the file name 
syntax must be extended so that it can express all possible file names. 

make_output: proctype (filejiame, string) returns (filejiame) signals (bad Jormat) 

This operation is used by programs that take input from a file and write new files 
whose names are based on the input file name. The operation transforms the file 
name into one that is suitable for an output file. The transformation is done as 
follows: (1) the suffix is set to the given suffix (argl); <2> if the old directory is not 
suitable for writing, then it is set to null; (3) the name, if null and meaningless, is set 
to "output". (Examples of directories that may not be suitable for writing are 
directories that involve transferring files over a slow network.) 
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makejemp: proctype (string, string, string) returns <f ilejiame) signals (badjormat) 

This operation creates a file name appropriate for a temporary file, using the given 
preferred directory name (argl), program name (arg2), and file identifier (arg3). To 
be useful, both the program name and the file identifier should be short and 
alphabetic. The returned file name, when used as an argument to streamtopen or 
iitreamtopen to open a new file for writing, is guaranteed to create a new file, and 
will not overwrite an existing file. Further file name references to the created file 
should be made using the name returned by the stream or istream getjname 
operation. 

equal: proctype (f ilejiame, filename) returns (booO 

Returns true if and only if the two filenames will unparst to equal strings. 

similar: proctype (filename, f ilejiame) returns (boot) 

The same as the equal operation. 

copy: proctype (f ilejiame) returns (f ilejiame) 

Copy simply returns its argument. 

Ill .3. A File Type? 



Although files are the basic information-containing objects in this package, we do not 
recommend that a file type be introduced. The reason for this recommendation is that few systems 
provide an adequate representation for files. 

On many systems, the most reliable representation of a file (accessible to the user) is a channel 
(stream) to that file. However, this representation is inappropriate for a CLU file type, since 
possession of a channel to a file often implies locking that file. 

Another possible representation is a file name. However, file names are one level indirect from 
files, via the file directory. As a result, the relationship of a file name to a file object is 
time-varying. Using file names as a representation for files would imply that all file operations 
could signal non_existent_file. 

Therefore, operations related to file objects are performed by two stream clusters, stream and 
istream, and operations related to the directory system are performed by procedures. 

Note that two opens for read with the same file name might return streams to two different 
files. We cannot guarantee anything about what may happen to a file after a program obtains a 
stream to it. 
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III.4. Streams 

Streams provide the means to read and write text files, and to perform some other operations 
on file objects. The operations allowed on any particular stream depend upon the access mode. In 
addition, certain operations may be null in some implementations. 

When an operation cannot be performed, because of an incorrect access mode, because of 
implementation limitations, or because of properties of an individual file or device, then the 
operation will signal not_possible (unless the description of the operation explicitly says that the 
invocation will be ignored). 

The PSE and SSE indicators used in the previous appendix will not be used here; in many 
cases the exact form (and time) of change depends on the particular operating system. 

open: proctype (filename, string) returns (stream) signals (not_possible(strlng)) 

The possible access modes (arg2) are "read", "write", and "append". If argl is not 
one of these strings, not_possible<"bad access mode") is signalled. In those cases 
where the system is able to detect that the specified pre-existing file is not a text file, 
not_possible("wrong file type") is signalled. 

If the mode is "read", then the named file must exist. If the file exists, a stream is 
returned upon which input operations can be performed. 

If the mode is "write", a new file is created or an old file is rewritten. A stream is 
returned upon which output operations can be performed. 

If the mode is "append", then if the named file does not exist, one is created. A 
stream is returned, positioned at the end of the file, upon which output operations 
can be performed. Append mode to storage files should guarantee exclusive access 
to the file, if possible. 

primary Jnput: proctype returns (stream) 

This operation returns the "primary" input stream, suitable for reading. This is 
usually a stream to the user's terminal, but may be set by the operating system. 

primary_output: proctype returns (stream) 

This operation returns the "primary" output stream, suitable for writing. This is 
usually a stream to the user's terminal, but may be set by the operating system. 

error .output: proctype returns (stream) 

This operation returns the "primary" output stream for error messages, suitable for 
writing. This is usually a stream to the user's terminal, but may be set by the 
operating system. 
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can_read: proctype (stream) returns (bool) 

Can_read returns true if input operations appear possible on the stream. 

can.write: proctype (stream) returns (bool) 

Can_write returns true if output operations appear possible on the stream. 

getc: proctype (stream) returns (char) signals (end_of Jite, not_possible( string)) 

This input operation removes the next character from the stream and returns it. 

peekc: proctype (stream) returns (char) signals (end.of Jile, not_possible( string)) 

This input operation is like getc, except that the character is not removed from the 
stream. 



empty: 



putc: 



proctype (stream) returns (bool) signals (not_possible(string)) 

This input operation returns true if and only if there are no more characters in the 
stream. It is equivalent to a call of peekc, where true is returned if peekc returns a 
character and false is returned if peekc signals endjof Jile. Thus in the case of 
terminals, for example, this operation may wait until additional characters have been 
typed by the user. 

proctype (stream, char) signals (not_possible(string)) 

This output operation appends the given character to the stream. Writing a newline 
indicates the end of the current line. 



putc Jmage: proctype (stream, char) signals (not_possible(string)> 

This output operation is like putc, except that an arbitrary character may be written 
and the character is not interpreted by the CLU I/O system. (For example, the ITS 
XGP program expects a text file containing certain escape sequences. An escape 
sequence consists of a special character followed by a fixed number of arbitrary 
characters. These characters could be the same as an end-of-line mark, but they are 
recognized as data by their context. On a record-oriented system, such characters 
would be part of the data. In either case, writing a newline in image mode would 
not be interpreted by the CLU system as indicating an end-of-line.) 

getc Jmage: proctype (stream) returns (char) signals (end_of Jile, not_possibkK string)) 

This input operation is provided to read escape sequences in text files, as might be 
written using putcimage. Using this operation inhibits the recognition of 
end-of-line marks, where used. 

getjineno: proctype (stream) returns (Int) signals (end.of Jile, not_possible( string)) 

This input operation returns the line number of the current (being or about to be 
read) line. If the system maintains explicit line numbers in the file, said line 
numbers are returned. Otherwise, lines are implicitly numbered, suiting with 1. 
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setjineno: proctype (stream, int) signals (not_possible( string)) 

If the system maintains explicit line numbers in the file, this output operation sets 
the line number of the next (not yet started) line. Otherwise, it is ignored. 

reset: proctype (stream) signals < not _possible( string)) 

This operation resets the stream so that the next input or output operation will read 
or write the first character in the file. The line number is reset to its initial value. 

flush: proctype (stream) 

Any buffered output is written to the file, if possible. Otherwise, there is no effect. 
This operation should be used for streams that record the progress of a program. It 
can be used to maximize the amount of recorded status visible to the user or 
available in case the program dies. 

getjinejength: proctype (stream) returns (int) signals (nojimit) 

If the file or device to which the stream is attached has a natural maximum line 
length, then that length is returned. Otherwise, nojimit is signalled. The line 
length does not include newline characters. 

get_pagejength: proctype (stream) returns (int) signals (nojimit) 

If the device to which the stream is attached has a natural maximum page length, 
then that length is returned. Otherwise, nojimit is signalled. Storage files will 
generally not have page lengths. 

get.date: proctype (stream) returns (date) signals (not_possible( string)) 

This operation returns the date of the last modification of the corresponding storage 
file. 

set_date: proctype (stream, date) signals (not _possible( string)) 

This operation sets the modification date of the corresponding storage file. (The 
modification date is set automatically when a file is opened in "write" or "append" 
mode.) 

getjiame: proctype (stream) returns (filejiame) signals (not_possible( string)) 

This operation returns the name of the corresponding file. It may be different than 
the name used to open the file, in that defaults have been resolved and link 
indirections have been followed. 

close: proctype (stream) 

This operation terminates I/O and removes the association between the stream and 
the file. Further use of operations that signal not_possible will signal not_possibte. 
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is_closed: proctype (stream) returns (boot) 

This operation returns true iff the stream is closed. 

isjterminal: proctype (stream) returns (bool) 

This operation returns true iff the stream is attached to an interactive terminal (see 
below). 

getl: proctype (stream) returns (string) signals (end.of _file, not_possible( string)) 

This input operation reads and returns (the remainder of) the current input line and 
reads but does not return the terminating newline (if any). This operation signals 
end_of_f ile only if there were no characters and end-of-file was detected. 

putl: proctype (stream, string) signals (not_possible( string)) 

This output operation writes the characters of the string onto the stream, followed by 
a newline. 

gets: proctype (stream, string) returns (string) 

signals (end _of .file, not .possible* string)) 

This input operation reads characters until a terminating character (one in arg2) or 
end-of-file is seen. The characters up to the terminator are returned; the terminator 
(if any) is left in the stream. This operation signals end_of_/ile only if there were 
no characters and end-of-file was detected. 

puts: proctype (stream, string) signals (not_possible(string)) 

This output operation simply writes the characters in the string using putc. 
Naturally it may be somewhat more efficient than doing a series of individual putc'y 

putzero: proctype (stream, string, int) signals (negative_field_width, not_possible( string)) 

Output the string. However, if the length of the string is less than the field width 
larg3), then also output the appropriate number of extra zeros before the first digit 
or 7 in the string (or at the end, if no such characters). 

putleft: proctype (stream, string, int) signals (negative_field_width, not_possible( string)) 

Output the string. However, if the length of the string is less than argl, then also 
output the appropriate number of extra spaces after the string. 

putright: proctype (stream, string, int) signals (negative Jield .width, not _possible( string)) 

Output the string. However, if the length of the string is less than org}, then also 
output the appropriate number of extra spaces before the string. 

putspace: proctype (stream, Int) signals (negativejield.width, not_possible< string)) 
This operation outputs argl spaces. 
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equal: proctype (stream, stream) returns (bool) 

Returns true if and only if both arguments are the same stream. 

similar: proctype (stream, stream) returns (bool) 

Returns true if and only both arguments are the same stream. 

copy: proctype (stream) returns (stream) 

Returns its argument. 

III.5. String I/O 

It is occasionally useful to be able to construct a stream that, rather than being connected to a 
file, instead simply collects the output text into a string. Conversely, it is occasionally useful to be 
able to take a string and convert it into a stream so that it can be given to a procedure that expects 
a stream. The following stream operations allow these functions to be performed: 

createjnput: proctype (string) returns (stream) 

An input stream is created that will return the characters in the given string. If the 
string is non-empty and does not end with a newline, then an extra terminating 
newline will be appended to the stream. 

createjoutput: proctype returns (stream) 

An output stream is created that will collect output text in an internal buffer. The 
text may be extracted using the gtt^amtents operation. 

get_contents: proctype (stream) returns (string) signals (not_possible( string)) 

This operation returns the text that has so far been output to the stream. It will 
signal not_possible if the stream was not created by create^mtput. 

A stream to a string does not have a file name, a creation date, a maximum line or page 
length, or explicit line numbers. 

III. 6. Istreams 

Istreams provide the means to read and write image files, and to perform some other 
operations on file objects. The operations allowed on any particular istream depend upon the 
access mode. In addition, certain operations may be null in some implementations. 
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When an operation cannot be performed, because of an incorrect access mode, because of 
implementation limitations, or because of properties of an individual file or device, then the 
operation will signal not_possible (unless the description of the operation explicitly says that the 
invocation will be ignored). 

Actual reading and writing of objects is performed by encode and decode operations of the 
types involved. All of the built-in CLU types, and the filename and date types, provide these 
operations. Designers of abstract types are encouraged to provide them also. The type 
specifications of the encode and decode operations for a type T are: 

encode: proctype (T. istream) signals (not_possible< string)) 

The encode operations are output operations. They write an encoding of the given 
object onto the istream. 

decode: proctype (istream) returns (T) signals (end_of_file, not _possible( string)) 

The decode operations are input operations. They decode the information written by 
encode operations and return an object "similar" to the one encoded. If the sequence 
of decode operations used to read a file do not match the sequence of encode 
operations used to write it, then meaningless objects may be returned. The system 
may in some cases be able to detect this condition, in which case the decode operation 
will signal not_possible("bad format"). The system is not guaranteed to detect all 
such errors. 

The istream operations are: 

open: proctype (file_name, string) returns (istream) signals (not_possible<string)) 

The possible access modes (arg2) are "read", "write", and "append". If arg2 is not 
one of these strings, not_possibleCbad access mode") is signalled. In those cases 
where the system is able to detect that the specified pre-existing file is not an image 
file, not_possible("wrong file type") is signalled. 

If the mode is "read", then the named file must exist. If the file exists, an image 
stream is returned upon which decode operations can be performed. 

If the mode is "write", a new file is created or an old file is rewritten. An image 
stream is returned upon which encode operations can be performed. 

If the mode is "append", then if the named file does not exist, one is created. An 
image stream is returned, positioned at the end of the file, upon which encode 
operations can be performed. Append mode to storage files should guarantee 
exclusive access to the file, if possible. 

can_read: proctype (istream) returns (bool) 

Can_jead returns true if decode operations appear possible on the istream. 
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can_write: proctype (istream) returns (bool) 

Can_write returns true if encode operations appear possible on the istream. 

empty: proctype (istream) returns (bool) 

Returns true if and only if there are no more objects in the file. 

reset: proctype (istream) signals (not_possible( string)) 

This operation resets the istream so that the next input or output operation will read 
or write the first item in the file. 

flush: proctype (istream) 

Any buffered output is written to the file, if possible. Otherwise, there is no effect. 

get.date: proctype (istream) returns (date) signals (not_possible< string)) 

This operation returns the date of the last modification of the corresponding storage 
file. 

set_date: proctype (istream, date) signals (not_possible( string)) 

This operation sets the modification date of the corresponding storage file. (The 
modification date is set automatically when a file is opened in "write" or "append* 
mode.) 

get_name: proctype (istream) returns (f ilejiame) 

This operation returns the name of the corresponding file. It may be different than 
the name used to open the file, in that defaults have been resolved and link 
indirections have been followed. 



close: 



is_closed: 



equal: 



similar: 



copy: 



proctype (istream) 

This operation terminates I/O and removes the association between the istream and 
the file. Further use of operations that signal not_possible will signal not_possible. 

proctype (istream) returns (bool) 

This operation returns true iff the istream is closed. 

proctype (istream, istream) returns (bool) 

Returns true if and only both arguments are the same istream. 

proctype (istream, istream) returns (bool) 

Returns true if and only both arguments are the same istream. 

proctype (istream) returns (istream) 
Returns its argument. 
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Terminal I/O is performed via streams attached to interactive terminals. Such a stream is 
normally obtained as an argument to. the top-level procedure of a program. A terminal stream is 
capable of performing both input and output operations. A number of additional operations are 
possible on terminal streams, and a number of standard operations have special interpretations. 

Terminal input will normally be buffered so that the user may perform editing functions, such 
as deleting the last character on the current line, deleting the current line, redisplaying the current 
line, and redisplaying the current line after clearing the screen. Specific characters for causing 
these functions are not suggested. In addition, some means must be provided for the user to 
indicate end-of-file, so that a terminal stream can be given to a program that expects an arbitrary 
stream and reads it until end-of-file. The end-of-file status of a stream is cleared by the reset 
operation. 

Input buffering is normally provided on a line basis. When a program first asks for input 
(using getc, for example) an entire line of input \% read from the terminal and stored in an internal 
buffer. Further input is not taken from the terminal until the existing buffered input is read. 

However, new input caused to be read by the getbuf operation will be buffered as a unit. 
Thus, one can read in a large amount of text and allow "editing" of the entire amount of text. In 
addition, when the internal buffer is empty, the gttcJLmage operation will read a character directly 
from the terminal, without interpreting it or echoing it. 

The user may specify a prompt string to be printed whenever a new buffer of input is 
requested from the terminal; the prompt string will also be reprinted when redisplay of the current 
line is requested by the user. However, if at the time that new input is requested an unfinished 
line has been output to the terminal, then that unfinished line is used instead as a prompt. 

The routine putcimage can be used to cause control functions, e.g. "\007 (bell) and "Vp' 
(new-page or clear-screen). We cannot guarantee the effect caused by any particular control 
character, but we recommend that the standard ASCII interpretation of control characters be 
supported wherever possible. 

Terminal output may be buffered by the system up to one line at a time. However, the buffer 
must be flushed when new input is requested from the terminal. 
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Terminal streams do not have modification dates. Terminal streams should have file names 
and implicit line numbers. 
Additional operations: 

getbuf: proctype (stream, string) returns (string) 

signals (end_of_file, not_possible( string)) 

This operation is the same as gets, except that for terminals with input buffering, 
the entire input read by getbuf is buffered as a unit, allowing input editing of the 
entire text. 

get .prompt: proctype (stream) returns (string) 

This operation returns the current prompt string. The prompt string is initially 
empty (""). The empty string is returned for non-terminal streams. 

set_prompt: proctype (stream, string) 

This operation sets the string to be used for prompting. If not possible, there is no 
effect. 

get _input_buf fered: proctype (stream) returns (bool) 

This operation returns true iff the stream is attached to a terminal and input is 
being buffered. 

set_input_buf fered: proctype (stream, bool) signals (not_possible< string)) 
This operation sets the input buffering mode. 

get_output_buf fered: proctype (stream) returns (bool) 

This operation returns true iff the stream is attached to a terminal and output is 
being buffered. 

set_output_buffered: proctype (stream, bool) signals (not_possible( string)) 

This operation sets the output buffering mode. Unbuffered output is useful for 
programs that output incomplete lines as they are working to allow the user to watch 
the progress of the program. 

III. 8. Miscellaneous Procedures 

working_dir: proctype returns (string) 

This procedure returns the current working directory. A null directory in a file 
name denotes the current working directory. 
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set_working_dir: proctype (string) signals (bad Jormat, not possible* string)) 
This procedure is used to change the working directory. 

delete_file: proctype (filejiame) signals (not_possible( string)) 

This procedure deletes the specified storage file. An exception may be signalled 
even if the specified file does not exist, but an exception will not be signalled solely 
because the file does not exist. For example, an exception may be signalled If the 
specified directory does not exist or if the user does not have access to the directory. 

renamejile: proctype (file_name, file_name> signals (not_possibte( string)) 

This procedure renames the file specified by argl to have the name specified by 
argl. Renaming across directories and devices may or may not be allowed. 

user_name: proctype returns (string) 

This procedure returns some identification of the user who is associated with the 
executing process. 

now: proctype returns (date) 

This procedure returns the current date and time. 

e_form: proctype (real, Int. Int) returns (string) signals (illegaljield.width) 

E Jorm returns a real literal of the form: 

[-^.field[.f_field]e±x_fuld 
where Lfleld is argl digits, f_fuld is argl digits, and x_/fc/d is Exp width digits 
(see Appendix II, Section i). If argl - 0, then the decimal point and/_/W<f are not 
present. If argl * 0.0, then the leftmost digit of Ljield is not zero. If argl - 0.0, 
then x_field is all zeros. Illegaljield.width occurs if argl < or arg3 < or 
argl ♦ argS < 1. If necessary, argl may be rounded to fit the specified form. 

f_form: proctype (real, Int Int) returns (string) signals (illegaijield.width, 

insufficient JieH_width) 

F Jorm returns a real literal of the form: 

[-\-fitld.f_fuld 
where f^fleld is argl digits. If argl > 0, then Lfleld is at least one digit, with 
leading zeros suppressed. If argl - 0, then i_field is not present. Illegaijield.width 
occurs if argl < or argl < or argl ♦ argl < 1. If necessary, argl may be rounded 
to fit the specified form. Insufficientjield.width occurs if 
reai$exponent(arg7) > argl after any rounding. 
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gjorm: proctype (real, int Int) returns (string) signals (illegalj ield.width, 

insuffident_field_width) 

If argl = 0.0 or -1 < real$exponent(ar|7> < arg2, then the result returned by this 
routine is f_form<arg/, arg2, arg)). Otherwise, the result is 
e_form(ar£/, 1, arg2+arg3-Exp width-3>. Illegal_field_width occurs if arg2 < or 
argl < or arg2 + arg) < 1. If necessary, argl may be rounded to fit the specified 
form. Insufficient_field_width occurs if argl * 0.0 and 

~(-l < real$exponent(«r£/> < ar g2) and (arg2 ♦ arg3 < Exp_ width ♦ 3) after any 
rounding. 

III.9. Dates 

Dates are immutable objects that represent calendar dates and times. The operations for dates 
are: 

create: proctype (Int, int. int. int, Int, int) returns (date) signals (badjormat) 

The arguments are (in order) day, month, year, hours, minutes, and seconds. 

get.all: proctype (date) returns (int, int, int, int. int, Int) 

Returns the components in the same order as given to create. 

get.day: proctype (date) returns (int) 

get_month: proctype (date) returns (int) 

get_year: proctype (date) returns (int) 

get.hour: proctype (date) returns (int) 

get .minute: proctype (date) returns (int) 

get_second: proctype (date) returns (int) 

(1 .. 31), (1 .. 12), (1 .. ), (0 .. 23), (0 .. 59), (0 .. 59), respectively. 

unparse: proctype (date) returns (string) 

e.g.. "12 January 1978 01:36:59" 

unparse.date: proctype (date) returns (string) 
e.g. "12 January 1978" 

unparsejime: proctype (date) returns (string) 
e.g. "01:36:59" 

equal: proctype (date, date) returns (booD 

The obvious equal. 
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similar: proctype (date, date) returns (booO 

Returns dateSequal (argl, arg2). 

copy: proctype (date) returns (date) 

Returns argl. 

It: proctype (date, date) returns (booD 

le: proctype (date, date) returns (bool) 

ge: proctype (date, date) returns (bool) 

gt: proctype (date, date) returns (booD 

The obvious relational operations; if dattl < date2, then datel occurs earlier than 
dat«2. 
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Appendix IV - Examples 

IV.l. Priority Queue Cluster 

This cluster is an implementation of priority queues. It inserts elements in O<log 2 n) time, and 
removes the "best" element in O(log 2 «> time, where n is the number of items in the queue, and 
"best" is determined by a total ordering predicate that the queue is created with. 

The queue is conceptually implemented as a binary tree, balanced such that every element is 
"better" than its descendants, and such that the minimum depth of the tree differs from the 
maximum depth by at most one. The tree is actually represented by keeping the elements in an 
array, with the left son of alii in a[i*21, and the right son in at**2+U. The root of the tree, till, is 
the "best" element. 

Each insertion or deletion must rebalance the tree. Since the tree is of depth strictly less than 
log 2 n, the number of comparisons is less than log 2 n for insertion and less than 2 log 2 n for 
removal of an element. Consequently, a sort using this technique takes less than 3 n log 2 n 
comparisons. 

This cluster illustrates the use of a type parameter, and the use of a procedure as an object. 
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p.queue » cluster [t: type] is create, best, size, empty, insert, remove 

pt - proctype (t. t) returns (booD 

at - arrayCtl 

rep - structfa: at, p: pt] X 1 < i <~ size(a) implies ~p(atil, ati/21> 

X Create a p.queue with a particular sorting predicate. P should be a transitive, non-reflexive, 

X total order. P(x, y) means that x is better than y. Each element in the p.queue should better 

X than its sons. However, this may not be true if mutable elements are changed while in the 

X p_queue. 

create - proc (p: pt) returns (cvt) 

return<repS(a: atSnewO, p: p}> x Low index of array must be 1 ! 

end create 

X Return the best element. 

best - proc (x: cvt) returns (t) signals (empty) 
return(at$bottom<x.a)) 

except when bounds: signal empty end 
end best 

X Return the number of elements. 

size - proc (x: cvt) returns (int) 
return(at$size(x.a)) 
end size 

% Return true if there are no elements. 

empty - proc (x: cvt) returns (booD 
return(at$empty(x.a)) 
end empty 
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X Insert an element of type t. 



insert 



proc (x: cvt, v: t) 






a: at :- x.a 






p: pt :- X.p 






at$addh(a, v) 


X 


Make room for new item 


son: int :» atjhigh(a) 


X 


Tentative index of v 


dad: int :« son/2 


X 


Get index of v's father 


while dad > cand p(v. atdad]) do 


X 


While v better than father 


afson] :- atdad] 


X 


Move father down 


son, dad :- dad, dad/2 


X 


Get new son, father indexes 


end 






a[sonl :- v 


X 


Insert the element into place 


end insert 







X Remove the best element and return it. 



remove - proc (x: cvt) returns (t) signals (empty) 
a: at :» x.a 
p: pt :« x.p 
r: t :- at$bottom<a) 

except when bounds: signal empty end 
v: t :- at$remh(a) 
max_son: int :« at$size(a) 
if max_son - then return(r) end 
max.dad: int :- max_son/2 
dad: int := 1 
while dad <- max_dad do 

son: int :- dad*2 

sval: t :- a(son] 

if son < max_son 

then nsval: t :- a[son ♦ 11 

if p(nsval, sval) then son, sval :- son 
end 

if ~p(sval, v) then break end 

atdad] :- sval 

dad :- son 

end 
aCdad] :- v 
return(r) 
end remove 



X Save best for later return 

X Shrink array; save element 

X Last son node 

X If now empty, we're done 

X Last node with a son 

X Tentative index of v 

X While node has a son 

X Get the first son 

X If there is a second son 

X Find the best son 
♦ 1, nsval end 

X If son doesn't beat v, we're done 

X Move son up 

X Move v down 

X Insert the element into place 

X Return the previous best element 



•nd p.queue 
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IV.2. Text Formatter 

The following program is a simple text formatter. The input consists of a sequence of 

unformatted text lines mixed with command lines. Each line (except possibly the last) is terminated 

by a newline character, and command lines begin with a period to distinguish them from text lines. 

For example: 

Justification only occurs in "fill" mode. 

In "nofill" mode, each input text line is output without Modification. 

The . br command causes a line-break. 

.br 

Just I ike this. 

The program produces justified, indented, and paginated text. For example: 

Justification only occurs in "fill" mode. In "nofill" mode, 
each input text line is output without modification. The .br 
command causes a line-break. 
Just I ike this. 

The output text is indented 10 spaces from the left margin, and is divided into pages of 50 text 
lines each. Each output line has 60 characters. A header of 5 lines, including a line giving the 
page number, is output at the beginning of each page. 

An input text line consists of a sequence of words and word-break characters. The 
word-break characters are space, tab, and newline; all other characters are constituents of words. 
Tab stops are considered to be every eight spaces. 

Tabs and spaces are accumulated in the current output line along with the input words. Thus, 
if two spaces occur in the input between two words and those words appear on the same output 
line, then they will be separated by at least two spaces. 

The formatter has two basic modes of operation. In "nofill" mode, each input text line is 
output without modification. In "fill" mode, input is accepted until no more words can fit on the 
current output line. Newline characters are treated essentially as spaces. The line is then justified 
by adding extra spaces between words until the last word has its last character in the rightmost 
position of the line. Initially the formatter is in fill mode. 

Justification is performed by enlarging spaces between words, as evenly as possible. Enlarging 
is performed alternately from the right and the left, starting from the right at the top of each page. 
Only spaces to the right of all tabs and between words are subject to justification. Furthermore, 
spaces preceding the first word following a tab are not subject to justification. If there are no 
spaces subject to justification, then no justification is performed and no error message is produced. 
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In fill mode, any input line that starts with a word-break character causes a line-break: the 
current output line is neither filled nor adjusted, but is output as is. An "empty" input line (one 
starting with a newline character) causes a line-break and then causes a blank line to be output. 

In nofill mode, if an input line is longer than the line length, It is output as given with no 
error message. In fill mode, if a word is longer than the line length, it is output as given on a line 
by itself with no error message. 

The formatter accepts three different commands: 

.br - causes a line-break 

nf - causes a line-break, and changes the mode to "nofill" 

.f i - causes a line-break, and changes the mode to "fill" 

An unrecognized command name causes an error message and is otherwise ignored. 

The program performs input and output on streams. 
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Fig. 8. Module Dependency Diagram 
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X Read the instream. processing it and placing the output on outstream and writing error messages 
X on errstream. 

format « proc (instream, outstream, errstream: stream) signals (badjarg(strlng)) 
If ~stream$can_read(instream) then signal bad_arg("input stream") 

elseif ~stream$can_write(outstream) then signal badjargCoutput stream") 
elseif ~stream$can_write(errstream) then signal bad_arg("error stream") 
end 
d: doc :- doc$create<outstream) 
line: int :- 

while ~stream$empty(instream) do 
line :- line +1 
do_line( instream, d) 

except when error <why: string): 

streamSputKerrstream, lnt$unparse(line) R ":\t" II why) 

end 
end 

doc$terminate(d) 
end format 

X Process an input line. The line is processed either as a text line or as a command line, 
X depending upon whether or not the first character of the line is a period. 

dojine - proc (instream: stream, d: doc) signals (errortstring)) 
c: char :« stream$peekc(instream) 
If c - V 

then do_command<instream, d) 

resignal error 
else dojextjinednstream, d) 
end 
end dojine 
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X Process a command line. This procedure reads up to the first space or tab in a line and 
X processes the string read as a command. The remainder of the line is read and discarded. 

do.command - proc (instream: stream, d: doc) signals (erroristring)) 
stream$getdinstream) . X skip the period 
n: string :- stream$gets<instream, " \t\n") 

except when end.of Jile: n :- "" end 
streamSgetKinstream) X read and discard remainder of input Hne 

except when end.of .file, end 
If n - "br" then docSbreakJindd) 

elseif n - "fi" then doclsetJilKd) 

elseif n - "nf" then dociset Jiof iWd) 

elseif n - "" then signal errorfmissing command") 

else signal errod*" H n ( " not a command") 

end 
end do.command 

X Process a text line. This procedure reads one line from instream and processes it as a text line. 

* If the first character is a word-break character, then a Une-break to caused. If the line is empty, 

X then a blank line is output. Otherwise, the words and word-break characters hi the line are 

X processed in turn. 

do_text_line - proc (instream: stream, d: doc) 
c char :- streamSgetdmstream) 
rfc-'W 
then doc$skip_line<d) X empty input line 
return 
elseif c - ' * cor c - V 
then doc$break Jinefd) . 



while c ~- *\n* do 

He-" then doc$add_space(d) 
elseif c - V then doctadd jaWd) 
else w: word :- wordSscanic, instream) 
docSadd_word(d, w) 



c :- streamSgetdinstream) 

end except when end j* Jile: end 
doc$add_newHne(d) 
end dojtext Jme 
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X The doc cluster implements documents, the properly indented, justified, and paginated output of 

X the text formatter. A document is constructed incrementally, using operations to add words, 

X spaces, tabs, and newlines to the end of the document. Other operations are used for the basic 

X formatting actions: breakjine to cause a line break, skipjine to output a blank line, set_f ill and 

X set.nof ill to set the formatting mode. Rather than collecting the entire document as a sequence 

X of lines before outputting to a file, each line is output as it is produced. The current output line 

X is maintained for the purposes of performing justification. To perform pagination and the 

X production of headings, the current line number and the current page number are also 

X maintained. 

doc - cluster is create, add.word, add.space, add Jab, add.newline, 
breakjine, skipjine, setjill, set_nofill, terminate 



rep - record! line: 


line. 


X 


fill: 


boci 


X 


r2l: 


boot, 


X 


lineno: 


Int. 


X 
X 


pageno: 


int. 


X 


outstream: 


stream] 


X 



The current line. 
True <«-> in fill mode. 
True <--> justify next line right-to-left. 
The number of lines output so far on this page 
(not including any header lines). 
The number of the current output page. 
X The output stream. 



chars_per Jine - 60 
lines_per_page - 50 
left_margin_siie - 10 

X Create a doc object. The first page is number 1, there are no lines yet output on it. Fill mode Is 
% in effect. 



create -' proc (outstream: stream) returns (cvti 



return( rep${ line: 
filk 
r2fc 
lineno: 


lineScreateO. 
true, 
true, 
0. 


pageno: 
outstream: 
•nd create 


1. 

outstream)) 
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X Process a word. This procedure adds the word W to the output document. If in nofill mode, 

X then the word is simply added to the end of the current line (there is no line-length checking in 

X nofill mode). If in fill mode, then we first check to see if there is room for the word on the 

X current line. If the word will not fit on the current line, we first justify and output the line and 

X then start a new one; justification is performed alternately from the right and the left on 

X successive lines. However, if the line is empty, then we just add the word to the end of the line: 

* if the word won't fit on an empty line, then it wont fit on any tew, so we have no choke but to 

X put it on the current line, even if it doesn't fit. 

add.word - proc (d: cvt, w: word) 

If d.fill cand ~line$empty(d.line) 
then If line$length<d.tine> ♦ wordtwidtlKw) > cnars_per_lme 
then lines justif yid.line, chars_per_Mne, d.r2P 
d.r2l :- ~d.r2l 
outputjinetd) 
end 
end 
line$add_word(d.Hne. w) 
endadd.word 

X Process a space ~ just add it to the current line. 

add_space - proc (d: cvt) 

line$add_space(d.line) 
end add jspace 

X Process a tab — just add it to the current one 

add Jtab * proc (d: cvt) 

line$addjab(d.line) 
endaddjab 

X Process a newline. If in nof ill mode, then the current line is output as is. Otherwise, a newHne 
% is treated just like a space. 

add_newline - proc (d: cvt) 
If -d.f in 
then output Jine(d) 
else NneSadd_soace<dJine) 



end add_newline 
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X Cause a line break. If the line is not empty, then it is output as is. Line breaks have no effect 
X on empty lines — multiple line breaks are the same as one. 

break Jine - proc (d: cvt) 

if ~line$empty(d.line) then output_line<d) end 
end break Jine 

X Cause a line break and output a blank line. 

skipjine > proc (d: cvt) 

break_line(up(d)) 

output_line(d) X line is empty 

end skipjine 

X Cause a line break and enter fill mode. 

set_f ill - proc (d: cvt) 

break_line(up(d)) 
d.fill :- true 
end set Jill 

X Cause a line break and enter nofill mode. 

set_nof ill - proc (d: cvt) 

break Jine(up(d)) 
d.fill :« false 
end set_nofill 

X Terminate the output document. 

terminate - proc (d: cvt) 

break Jine(up(d)) 
end terminate 
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X Internal routine. 

X Output line is used to keep track of the line number and the page number and to put out the 
X header at the top of each page. At the top of each page, justification is reset to start from the 
X right. 

outputjine - proc (d: rep) 
if d.lineno - 

then if d.pageno > 1 

then stream$putc(d.outstream, \p*) end 
stream$puts(d.outstream, "\n\n") X print header 
stream$putspace(d.outstream, left_margin_size) 
stream$puts<d.outstream, "Page ") 
stream$puts(d.outstream, intSunparse(d.pageno)) 
stream$puts(d.outstream, "\n\n\n") 
end 
d.lineno := d.lineno + 1 
if ~line$empty(d.line) 
then streamSputspacefd.outstream, left_margin_siie) 

line$output(d.iine, d.outstream) 
end 
streamSputc(d.outstream, ^n , ) 
if d.lineno - lines_per_page 
then d.r2l :•> true 
d.lineno :- 
d.pageno :<■ d.pageno + 1 
end 
end outputjine 

end doc 
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X A line is a mutable sequence of words, spaces, and tabs. The length of a line is the number of 

X character positions that would be used if the line were output. One may output a line onto a 

X stream, in which case the line is made empty after printing. One may also justify a line to a 

X given length, which means that some spaces in the line will be enlarged to make the length of 

X the line equal to the desired length. Only spaces to the right of all tabs are subject to 

X justification. Furthermore, spaces preceding the first word in the output line or preceding the 

X first word following a tab are not subject to justification. If there are no spaces subject to 

X justification or if the line is too long, then no justification is performed and no error message is 

X produced. 

line - cluster is create, add_word, add_space, addjab, length, empty, justify, output 

token « variantispace: int. X the int is the width of the space 

tab: int. X the int is the width of the tab 
word: word] 
at - array(token] 

rep - record length: int. X the current length of the line 

stuff: at] X the contents of the line 

X no two adjacent tokens will both be spaces 

maxjab.width -8 1 maximum chars per tab 

X Create an empty line. 

create - proc returns (cvt> 
return(rep${length: 0, 

stuff: atSnewO}) 
•nd create 

X Add a word at the end of the line. 

add.word - proc (I: cvt, w: word) 

atSaddh(l.stuff, token$make_word(w» , 

l.length :- l.length + wordSwidth(w) 
•nd add.word 

X Add a space at the end of the line, combining it with an existing trailing space, if any. 

add_space - proc (I: cvt) 

l.length :- l.length + 1 
tagcase atStop(l.stuff) 
tag space (width: int): token$change_space(at$top(l.stuff), width + 1) 

return 
others: 

end except when bounds: end X Handle empty array case. 
at$addh(l.stuff, token$make_space(l)) 
end add_space 
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X Add a tab at the end of the line. 

addjab - proc (I: cvt) 

width: int :- maxjab.width - (l.length // max_tab_width) 
l.length :- l.length ♦ width 
at$addh(l.stuff, tokenSmakeJab(width)) 
end addjab 

X Return the current length of the line. 

length - proc (I: cvt) returns (int) 
return< l.length) 
end length 

X Return true if the line is of length zero. 

empty - proc (I: cvt) returns (bool) 
return* l.length - 0) 
end empty 

X Justify the line, if possible, so that it's length is equal to LEN. Before justification, any trailing 

X space is removed. If the line length at that point is greater or equal to the desired length, then 

X no action is taken. Otherwise, the set of justifiable spaces is found, as described above. If there 

X are no justifiable spaces, then no further action is taken. Otherwise, the justifiable spaces are 

X enlarged, as evenly as possible, to make the line length the desired length. Enlarging is 

X performed either from the right or the left, depending on R2L. 

justify - proc. (I: cvt, len: Int r2l: bool) 
tagcase at$top<l.stuff) 

tag space (width: int): at$remh(l.stuff) 

l.length :- Llength - width 

others: 

end except when bounds: end X Handle empty array case, 
if l.length >- len then return end 
dif f: int :- len - Llength 
first: int :- f ind _f irst_justifiabte_space(l) 

except when none: return end 
enlargejpacesd, first, diff, r2» 
•nd justify 
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X Output the line and reset it. 

output - proc (I: cvt, outstream: stream) 

for t: token in at$elements(l.stuff) do 
tagcase t 

tag word (w: word): word$output(w, outstream) 
tag space, tab (width: int): stream$putspace(outstream, width) 
end 
end 
l.length :- 
atltrimd.stuf f, 1, 0) 
end output 

X Internal routines. 

X Find the first justifiable space. This space is the first space after the first word after the last 

X tab in the line. Return the index of the space in the array. Signal NONE if there are no 

X justifiable spaces. Although no two adjacent tokens will both be words (as lines are currently 

X used), no such assumption is made here. 

f ind_f irst.justif iable_space - proc (I: rep) returns (int) signals (none) 
a: at :- l.stuff 

if at$empty(a> then signal none end 
lo: int :« atSlow(a) 
hi: int :- at$high(a) 
i: int :- hi 
while i > lo cand ~token$is_tab<aEi]> do X find last tab in the line (if any) 

i :- i - 1 

end 
while i <« hi cand <vtoken$is_word(a[il) do X find first word after it (or first in line) 

i :- i ♦ 1 

end 
while i <- hi cand ~token$is_space(a[i]> do X find first space after that 

i :• i ♦ 1 

end 
if i > hi then signal none end 
return* i> 
end f ind_f irst_justif iable_$pace 
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X Enlarge the spaces in the array whose indexes are at least FIRST. Add a total of DIFF extra 
X character widths of space. Add spaces working from the right or the left, depending on R2L. 

enlarge_spaces - proc (I: rep, first, diff: int, r2l: bool) 
nspaces, last: Int :<* count_spaces(l, first) 
if nspaces - then return end 
by: int :- 1 
If r2l 

then by :- -1 

first, last :- last, first 
end 
neach: int :«= diff / nspaces X Amount to increase each space. 
nextra: int :« diff // nspaces X Leftovers to be distributed, 
for i: int in inttf rom_to_by<first, last, by) do 
tagcase l.stufftil 

tag space (width: int): width :- width ♦ neach 
If nextra > 
then width :- width + 1 
nextra :- nextra - 1 
end 
token$change_space<l.stuff[i], width) 
others: 
end 
end 
l.length :- l.length + diff 
end enlarge_spaces 

X Return a count of the number of spaces in the line whose indexes in the array are at least IDX, 
X and return the index of the last space counted. 

count_spaces - proc (I: rep, idx: int) returns (int. Int) 
count: int :« 

for i: int in Inttf rom_to(idx, atShighd.stuff)) do 
tagcase Lstuffti] 

tag space: count :- count ♦ 1 

idx :- i 
others: 
end 
end 
return(count, idx) 
end countjspaces 

and line 
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X A word is an item of text. It may be output to a stream. It has a width, which is the number of 
X character positions that are taken up when the word is printed. 

word - cluster is scan, width, output 

rep - string 

X Construct a word whose first character is C and whose remaining characters are to be removed 
X from the instream. 

scan • proc (c: char, instream: stream) returns (cvt) 
s: string :- string$c2s(c> 
s :- s II stream$gets(instream, " \t\n") 

except when endjof J ile: end 
returrKs) 
end scan 

X Return the width of the word. 

width - proc (w: cvt) returns (int) 
return( strtngSsizet w» 
end width 

X Output the word. 

output - proc (w: cvt outstream: stream) 
stream$puts(outstream, w) 
end output 

•nd word 
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IV.3. Text Substitution Program 

The following (rather complex) program performs textual substitutions of one set of strings 
for another throughout a file. It can. be useful in expanding abbreviations, renaming variables, 
correcting misspellings, etc. 

Substitutions are specified by a list of rules read from a file. Each rule consists of a 

left-hand-side (the string to be replaced) and a right-hand-side (the string to replace with), 

separated by a V character. Each rule is terminated by a newline character. For example, to 

substitute "BEGIN" for "begin" and "END" for "end", the rules would be. 

begin>BEGIN 
end>END 

All substitutions are done simultaneously, so for example it is possible to substitute "a" for "b" 

and "b" for "a". Substitution is not performed on the results of a substitution, only on the original 

text. When performing substitutions, the rule with the longest left-hand-side always takes 

precedence. Thus, given the two rules: 

abox 
a>y 

an input of "abcab" would be transformed to "xyb". 

Within a rule, characters can be represented with the same escape sequences allowed in string 
literals. For example, the following rule replaces each newline by two newlines: 

\n>\n\n 
In addition, the escape sequence "\>" can be used to represent the character ">". 

The program asks for the name of a rule file, and then loops asking for pairs of input and 
output file names to process using the given rules. If no input file is given, a new rule file Is 
requested. If no rule file is given, the program terminates. If no output file is given, a new input 
file is requested. 

The program is implemented using a pushdown transducer: a pushdown automaton extended 
to produce output. 
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X Ask for a rule file and build a pushdown transducer for It, and then loop asking for pairs of 

X input and output files and processing them using that pushdown transducer. When no input 

X file is given, ask for a new rule file. When no rule file is given, terminate. When no output 

X file is given, ask for a new input file. 

substitute - proc 

tyo: stream :- stream$primary_output() 
while true do 

rst: stream :- get_stream("rute file: ", "read") 

except when refused: return end 
m: pdt :- buiM_pdt(rst) ^ 

except when illegal (line: int, why: string): 
stream$close(rst) 

streamSputKtyo, intSunparsetline) I "At" 1 why) 
continue 
end 
stream$close(rst) 
while true do 

inst: stream :- get_stream( "input file: ", "read") 

except when refused: break end 
outst: stream :« get_stream("output file: ", "write") 
except when refused: stream$close<mst) 

contimie 
end 
run_pdt(in$t, outst, m) 
stream$close(outst) 
streamSclose<inst) 



end substitute 
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* Read in a fi1e_name and open the file in the given mode. Signal refused if no filename is 
X given. 

get_stream - proc (prompt, mode: string) returns (stream) signals (refused) 
tyi: stream :- streamSprimaryJnputO 
tyo: stream :■ stream$primary_output() 
tyi.input_buffered :- true 
while true do 

streamSputs(tyo, prompt) 
f s: string :- streamSgetKtyi) 
If stringSempty(fs) 

then signal refused end 
return(stream$open(fite_name$parse(fs) > mode)) 

except when bad Jormat: streamSputKtyo, "bad format file name") 
when not_possible (s: string): streamSputKtyo, s) 
and 
end except when end_ofJile. signal refused end 
end get .stream 

X Read and parse the rules from the given stream. Construct and return a pushdown transducer 
X corresponding to those rules. 

build _pdt - proc (st: stream) returns (pdt) signals (illegaKint, string)) 
rule - structUeft, right: string] 
rulelist - arraytrule] 
rules: rulelist :- rulelist$new() 
line: int :- 1 
while true do 

while streamSpeekc(st) - *\n' do 
stream$getc(st) 
line :- line + 1 

end except when end_of _f ile: return(pdt$create(rules)) end 
left: string :- get_rule_part(st, ">\n") 
if stringSemptydeft) 

then signal illegaldine, "missing left side of rule") end 
If stream$empty(st) cor stream$getc(st) ~- '>' 

then signal illegaldine, "missing right side of rule") end 
right: string :- get_rule_part(st, "\n") 
rulelist$addh(rules, rule$(left: left, right: right)) 
end except when illegal (why: string): signal illegaldine, why) end 
end build_pdt 
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% Parses a rule part up to but not including the given terminators. Accepts the regular escape 
% sequences, plus "\> B to represent ">". 

get_rule_part - proc (st: stream, terms: string) returns (string) signals (iHegaK string)) 
terms :- stringSappendfterms, AS") 
part: string :« "" 
while true do 
begin 

part :» part H streamSgetsfst, terms) 
If streamSpeekdst) ~« 'VV 
then returrripart) end 
end except when endjrf _f ile: retum(part) end 
c: char :- streamSgetdst) 

x: int :- strings indexdstreamSpeekdst), " , \"\\>fitpbrv ,, > 
If x >0 
then streamSgetdst) 

c :- "VWAnVApMArVrtxl 
else sum: Int :- 

for i: Int in IntSf romjod, 3) do 
c :■ streamSgetdst) 
If c < V cor c > T 

then exit ilkgaljchar end 
sum :- sum * 8 ♦ charSc2Kc) - charSc2HV) 
end 
c :- charSi2dsum) 
end 
part :- stiingSappend(part, c) 



except when end_of Jile, itlegaljchar: signal iBegalOnd escape sequence") 
end 
end get_rute_part 

V Perform all substitutions on a file. 

run_pdt - proc (inst, outst stream, m: pdt) 
while true do 

pdtSmoveOn, streamSgetdtnst)) 

except when output (s: string): streamSputsfoutst, s) and 
end except whan endjof Jile: streamSpuufoutst, pdtSr esetCm)) and 
andrun_pdt 
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X A pushdown transducer is a collection of states connected by transitions. A transition can also 

X connect a state to an output condition, with the initial state as the implicit next state. A 

X transition is labeled with both an input character and a set of lookahead characters; the 

X transition is to be followed if the current input character matches and the current lookahead 

X character is in the lookahead set. The basic operation of the transducer is move, which moves 

X according to the current input character (at the top of the pushdown list), and the current 

X lookahead character (given as an argument). Output is produced by signalling with a string 

X result. 

pdt - cluster is create, move, reset 

rep - recorcf first: state, X initial state 

buffer: buf , X path from initial state to current state 

X plus next input char 
current: state] X current state 

rule - struct! left, right: string] 
rulelist - array! rule] 
buf - arraytchar) 

X Two phase construction. First construct all states and transitions needed to follow any single 

X rule from the initial state to its output condition. Then fill in missing cross-transitions for rules 

X that interact with each other, in (approximately) the following manner. For each substring of a 

X left-hand side of a rule (a path from some state S3 to some state S2) that is also a prefix of a 

X left-hand side of a rule (a path from the initial state to some state SI), add all transitions out of 

X SI (not conflicting with existing transitions out of S2> as transitions out of S2. 

create ~ proc (rules-, rulelist) returns (cvt) signals (illegaK string)) 
first: state :* state$create() 
for r: rule In rulelist$elements( rules) do 
add_rule(f irst, r) 
end resignal illegal 
for path: string, s2: state In alLstates(first) do 
for si: state in all.suff ix_states(path, first) do 
replicate(sl, s2) 
end 
end 
return(rep${ first: first, buffer: buf$new<), current: first)) 
end create 
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X Make a move with the given char as the look a head input. If a rule is recognized (an output 

X condition is reached), the left side of the rule is discarded from the end of the buffered input, 

X and any remaining input is concatenated with the right side of the rule and returned for output 

X If no rule can match the current buffered input, the entire buffered input is returned for 

X output. 

move - proc (m: cvt, peek: char) signals (output* string)) 

m.currcnt :- stateSmove(m.current, buf$top(m.buffer), peek) 
except when output (size: Int, out: string): 

buf Strim(m.buf fer, 1, buf$size(m.buffer) - size) 
out :- resetl(m) N out 
buf$addh(m.buffer, peek) 
signal output(out) 
when no_match: 

out: string :- resetl(m) 
bufSaddMm.buf fer, peek) 
signal output(out> 
when bounds: 
end 
buf$addh(m.buf fer, peek) 
end move 

X Force input termination. Returns any final output. Restores the pdt to its Initial state. 

reset - proc (m: cvt) returns (string) 
extra: string :- "" ' 

m.current :- stateSmovel(m.current, buf$top(m.buffer)) 
except when output (size: Int, out: string): 

buf $trim(m.buf fer, 1, buf Ssizetmbtif fer) - size) 
extra :-out 
whan no_rnatch, bounds: 



raturn(resetl(m) I extra) 
end reset 

* Internal routine. 

% Return current buffered input. Reset current state to initial state. 

resetl - proc (m: rep) returns (string) 
s: string :- strmg$ac2s<m.buf fer) 
buf$trtm(m.buffer, 1,0) 
mxurrent :- rafirst 
retarnts) 
end resetl 

pdt 
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X Add a new rule. Follow existing path through pdt as far as possible, and then add new states. 
X Just add states and transitions needed to follow the rule from the initial state to the output 
X condition, do not add cross-transitions for interacting rules. 

add_rule = proc <s: state, r: rule) signals <illegaKstring» 
rule = structfleft. right: string] 
left: string := r.left 
if string$empty(left) 

then signal illegalCrule has empty left side") end 
size: int := stringSsizedeft) 
i: int :- 1 

peeks: string := "" 
while i < size do 

s :- state$move<s, leftCil. leftCi ♦ 1]> 

i :- i ♦ 1 

end except when output (♦): peeks :« strlng$c2s(left£l ♦ 1]) 
when nojnatch: 
end 
while i < size do 

ns: state :- statetcreateO 

state$addjfnove(s, leftfil, peeks, ns) 

s :- ns 

i :«= i ♦ I 

peeks;-"" 

end 
state$add_output(s, leftCsizel. size, r.right) 

except when illegal: signal illegaK"conflicting rules") end 
end add_rule 

X Traverse depth first left to right, yielding all path-state pairs reachable from given state. Depth 
X first traversal is used to satisfy the requirement that the rule with the longest left-hand side 
* takes precedence. 

all.states « iter (s: state) yields (string, state) 

for input: char, peeks: string, next: state In state$all_moves(s) do 
pre: string :- string$c2s(input) 
for path: string, ns: state in all_states(next) do 
yieicKpre II path, ns) 
end 
yiehKpre, next) 
end 
end all.states 



162 Text Substitution Program JIVJ 

* Given a string, follow all proper suffixes (longest first) of the string as paths from the given 
X state, and yield the final state reached by each legal path. The suffixes are done longest first to 

* satisfy the requirement that the rule with the longest left-hand side takes precedence. 

all_$uf f ixjstates - iter (path: string, first: state) yields (state) 
size: Int :- strlng$size(path) 
for i: int in int$f rom_to(2, size) do 
s: state :- first 
j:lnt:«i 
writhe j < size do 

s :- state$move(s, pathCp, pathtj + U) 

j =- j ♦ 1 

end except others: continue end 
s :- stateSmoveHs, patM jj) 

except others: continue end 
yields) 
end 
end aHjuf fix jstates 

X For each input char causing a transition out of SI but not causing a transition out of S2, add a 
X transition out of S2. 

replicate - proc (si, s2: state) 

for input: char, peeks: string, s: state In statetalljnovesbD do 
state$movel(s2, input) 

except when output <«): continue 
when nojnatch: 



stateSadd_move(s2, input, peeks, s) 
except others: end 



for input: char, size: int, out: string In statetafljoutpurfsl) 
state$add_output(s2, input, size, out) 
except others: end 



end replicate 
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A state is a collection of arcs, each labeled with the input character required to take the 
transition. An arc either points to a new state, or indicates an output condition (with the initial 
state as the implicit new state). For arcs to new states, a list of acceptable lookahead characters is 
also present, with an empty list indicating "all others". An output condition implicitly carries an 
"all others" lookahead list. There are operations to add new transitions, iterate over the 
transitions, and move to a new state given the current input and lookahead. 



state - cluster is create, all_moves, add_move, all.outputs, add.output, move, movel 



rep ■ arrayttrans] 

trans = structf input: char, 

next: arc] 
arc - oneoft state: pstate, 
output: output] 
pstate - recordCpeeks: string, 

state: state] 
output - structfsize: int, 

out: string] 



X Create a new state with no transitions. 

create - proc returns (cvt> 
returrrt rep$new()) 
end create 



X a state is a set of transitions 
X a transition is a labeled arc 

X an arc is to a new state 

X or to an output condition 

X empty lookahead means "all others" 

X size of left side of rule 

* right side of rule 

X implicit "all others" lookahead 



X Yield all transitions (input, lookaheads, next state) from the given state to new states. 

alljnoves - Iter (s: cvt) yields (char, string, state) 
for t: trans in repSelements(s) do 
tagcase t.next 
tag state (ps: pstate): yieldft.input, ps.peeks, ps.state) 
tag output: 
end 
end 
end alljnoves 
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X Add a transition from one state to another for the given input and that subset of the given list 

X of lookahead chars not present on existing transitions for the given input. The addition is 

X illegal if all of the lookaheads are already accounted for by existing transitions. An empty 

X lookahead list denotes "all others not specified on other transitions for the same input". 

addjnove - proc (from: cvt, input: char, peeks: string, to: state) signals (illegal) 
rpeeks: string :« peeks 
for t: trans In rep$elements(from) do 
If t.input - input 
then tagcase t.next 

tag state (ps: pstate): if stringSempty(ps.peeks> 
then signal illegal 
else rpeeks :« strip(rpeeks, ps.peeks) 
end 
tag output: If stringSempty(peeks) 

then signal illegal end 
end 
end 
end 
if strlng$empty<rpeeks) cand ~string$empty(peeks) 

then signal illegal end 
rep$addKfrom, trans${ input: input, 

next: arc$make_state(pstate${peeks: peeks, 

state: to)))) 
end add_move 

X Yield all transitions (input, size, output) from the given state to output conditions. 

atljoutputs - Iter (s: cvt) yields (char, int. string) 
for t: trans in repSelements(s) do 
tagcase tnext 
tag state, 
tag output (x: output): yleldCtinput, x^ize, x.out) 



•ad sljoutputs 
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X Add a transition from the given state to an output condition for the given input. An "all 
X others" lookahead list is implicit for this transition, so the addition is illegal if a transition for 
X the given input and an "all others" lookahead list already exists. 

add_output =■ proc (from: cvt, input: char, size: Int. out: string) signals (illegal) 
for t: trans in repSelements(from) do 
if t.input = input 
then tagcase t.next 

tag state (ps: pstate): 

if ~stringlempty( ps.peeks) 

then continue end 
peeks: string :- "" 

for x: trans in rep$elements(down(ps.state)) do 
peeks :« string$append(peeks, x.input) 
end 
ps.peeks :« peeks 
tag output: 

signal illegal 
end 
end 
end 
repSaddWfrom, trans${input: input, 

next: arc$make_output(output$(size: size, 

out: out)))) 
end add.output 

X Return the next state for the given input and lookahead. Signal no_match if no transition is 
X possible. Signal output if an output condition is reached. 

move = proc (s: cvt, input, peek: char) returns (state) signals (nojnatch, outputOnt, string)) 
for t: trans in rep$elements(s) do 
if t.input - input 
then tagcase t.next 

tag state (ps: pstate): 

if string$empty(ps.peeks) cor strlngSindextfpeek, ps.peeks) > 
then return(ps.state) end 
tag output (x: output): 

signal output(x.size, x.out) 
end 
end 
end 
signal nojnatch 
end mOve 
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X Return the next state for the given input with no further input available. Signal nojnatch if 
X no transition is possible. Signal output if an output condition is reached. 

movel - proc (s: cvt. input: char) returns (state) signals (no_match, outputUnt string)) 
for t: trans in repSeteroentsfs) do 
if t.input - input 
then tagcase t.next 

tag state (ps: pstate): if stringSempty(ps.peeks) 

then returntpMtate) end 
tag output (x: output): signal outpuHx jize, x.out) 
end 
end 
end 
signal nojnatch 
end movel 

end state 

X Remove chars in USING from chars in FROM. 

strip - proc (from, using: string) returns (string) 
for c: char in strmg$chars(using) do 
i: int :- stringSmdexdc, from) 
rfi>0 

then from :- stringSsubstrtf rom. 1.1 - 1) I stringSresrtfrom. I ♦ end 
end 
retum(from) 
! strip 
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